Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 177 results
  • Monocular Depth Estimation in Images and Sequences Using Occlusion Cues

     Palou Visa, Guillem
    Defense's date: 2014-02-21
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    Quan els humans observen una escena, son capaços de distingir perfectament les parts que la composen i organitzar-lesespacialment per tal de poder-se orientar. Els mecanismes que governen la percepció visual han estat estudiats des delsprincipis de la neurociència, però encara no es coneixen tots els processos biològic que hi prenen part. En situacions normals,els humans poden fer servir tres eines per estimar l'estructura de l'escena. La primera és l'anomenada divergència. Aprofita l'úsde dos punts de vista (els dos ulls) i és capaç de determinar molt acuradament la posició dels objectes ,que a una distància defins a cent metres, romanen enfront de l'observador. A mesura que augmenta la distància o els objectes no es troben en el campde visió dels dos ulls, altres mecanismes s'han d'utilitzar. Tant l'experiència anterior com certs indicis visuals s'utilitzen enaquests casos i, encara que la seva precisió és menor, els humans aconsegueixen quasibé sempre interpretar bé el seu entorn.Els indicis visuals que aporten informació de profunditat més coneguts i utilitzats són, per exemple, la perspectiva, les oclusionso el tamany de certs objectes. L'experiència anterior permet resoldre situacions vistes anteriorment com ara saber quins regionscorresponen al terra, al cel o a objectes.Durant els últim anys, quan la tecnologia ho ha permès, s'han intentat dissenyar sistemes que interpretessin automàticamentdiferents tipus d'escena. En aquesta tesi s'aborda el tema de l'estimació de la profunditat utilitzant només un punt de vista iindicis visuals d'oclusió. L'objectiu del treball es la detecció d'aquests indicis i combinar-los amb un sistema de segmentació pertal de generar automàticament els diferents plans de profunditat presents a una escena. La tesi explora tant situacionsestàtiques (imatges fixes) com situacions dinàmiques, com ara trames dins de seqüències de vídeo o seqüències completes. Enel cas de seqüències completes, també es proposa un sistema automàtic per reconstruir l'estructura de l'escena només ambinformació de moviment. Els resultats del treball son prometedors i competitius amb la literatura del moment, però mostrenencara que la visió per computador té molt marge de millora respecte la presició dels humans.

  • Foreground objects segmentation for moving camera scenarios based on SCGMM

     Gallego, Jaime; Pardas Feliu, Montserrat; Solano, Montse
    Date of publication: 2013-01-01
    Book chapter

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a new system for segmenting non-rigid objects in moving camera sequences for indoor and outdoor sce narios that achieves a correct object segmentation via global MAP-MRF framework formulation for the foreground and background classification task. Our proposal, suitable for video indexation applications, receives as an input an initial segmentation of the object to segment and it consists of two region-based parametric probabilistic models to model the spatial (x,y) and color (r,g,b) domains of the foreground and background classes. Both classes rival each other in modeling the regions that appear within a dynamic region of interest that includes the foreground object to segment and also, the background regions that surrounds the object. The results presented in the paper show the correctness of the object segmentation, reducing false positive and false negative detections originated by the new background regions that appear near the region of the object

  • Region based foreground segmentation combining color and depth sensors via logarithmic opinion pool decision

     Gallego Vila, Jaime; Pardas Feliu, Montserrat
    Journal of visual communication and image representation
    Date of publication: 2013-04-01
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a novel foreground segmentation system that combines color and depth sensors information to perform a more complete Bayesian segmentation between foreground and background classes. The system shows a combination of spatial-color and spatial-depth region-based models for the foreground as well as color and depth pixel-wise models for the background in a Logarithmic Opinion Pool decision framework used to correctly combine the likelihoods of each model. A posterior enhancement step based on a trimap analysis is also proposed in order to correct the precision errors that the depth sensor introduces. The results presented in this paper show that our system is robust in front of color and depth camouflage problems between the foreground object and the background, and also improves the segmentation in the area of the objects¿ contours by reducing the false positive detections that appear due to the lack of precision of the depth sensors.

  • Parametric Region-Based Foreround Segmentation in Planar and Multi-View Sequences

     Gallego Vila, Jaime
    Defense's date: 2013-10-14
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Human body analysis using depth data.

     Suau Cuadros, Xavier
    Defense's date: 2013-12-04
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    L'anàlisi del cos humà és una de les àrees més àmplies del camp de la visió per computador. Els investigadors han posat un gran esforç en el camp de l'anàlisi del cos humà, sobretot durant la darrera dècada, degut als grans avenços tecnològics, tant pel que fa a les càmeres com a la potència de càlcul. L'anàlisi del cos humà engloba varis temes com la detecció i segmentació de persones, el seguiment del moviment del cos, o el reconeixement d'accions. Tot i que els éssers humans duen a terme aquestes tasques d'una manera natural, es converteixen en un difícil problema quan s'ataca des de l'òptica de la visió per computador. Situacions adverses, com poden ser la perspectiva del punt de vista, les oclusions, les condicions d'il·luminació o la variabilitat de comportament entre persones, converteixen l'anàlisi del cos humà en una tasca complicada.En el camp de la visió per computador, l'evolució de la recerca va sovint lligada al progrés tecnològic, tant dels sensors com de la potència de càlcul dels ordinadors. Els mètodes tradicionals d'anàlisi del cos humà estan basats en càmeres de color. Això limita molt els enfocaments, ja que la informació disponible prové únicament de les dades de color. El concepte multivista va suposar salt de qualitat important. En els enfocaments multivista es tenen múltiples càmeres gravant una mateixa escena simultàniament, permetent utilitzar informació 3D gràcies a algorismes de combinació estèreo. El fet de disposar de informació 3D és un punt clau, ja que el cos humà es mou en un espai tri-dimensional. Així doncs, problemes com les oclusions es poden apaivagar si es disposa de informació 3D.L'aparició de les càmeres de profunditat comercials ha suposat un segon salt en el camp de l'anàlisi del cos humà. Mentre els mètodes multivista tradicionals requereixen un muntatge pesat i car, i una calibració precisa de totes les càmeres; les noves càmeres de profunditat ofereixen informació 3D de forma directa amb un sol sensor. Aquestes càmeres es poden instal·lar ràpidament en una gran varietat d'entorns, ampliant enormement l'espectre d'aplicacions, que era molt reduït amb enfocaments multivista. A més a més, com que les càmeres de profunditat estan basades en llum infraroja, no pateixen problemes relacionats amb canvis d'il·luminació.En aquesta tesi, ens centrem en l'estudi de la informació que ofereixen les càmeres de profunditat, i la seva aplicació al problema d'anàlisi del cos humà. Proposem noves vies per descriure les dades de profunditat mitjançant descriptors específics, capaços d'emfatitzar característiques de l'escena que seran útils de cara a una posterior anàlisi del cos humà. Aquests descriptors exploten l'estructura 3D de les dades de profunditat per superar descriptors 3D generalistes o basats en color. També estudiem el problema de detecció de persones, proposant un mètode per detectar caps robust i ràpid. Ampliem aquest mètode per obtenir un algorisme de seguiment de mans que ha estat utilitzat al llarg de la tesi. En la part final del document, ens centrem en l'anàlisi de les mans com a subàrea de l'anàlisi del cos humà. Degut a la recent aparició de les càmeres de profunditat, hi ha una manca de bases de dades públiques. Contribuïm amb una base de dades pensada per la localització de dits i el reconeixement de gestos utilitzant dades de profunditat. Aquesta base de dades és el punt de partida de dues contribucions sobre localització de dits i reconeixement de gestos basades en tècniques de classificació. En aquests mètodes, també explotem les ja mencionades propostes de descriptors per millor adaptar-nos a la naturalesa de les dades de profunditat.

  • Enhanced foreground segmentation and tracking combining Bayesian background, shadow and foreground modeling

     Gallego Vila, Jaime; Pardas Feliu, Montserrat; Haro Ortega, Gloria
    Pattern recognition letters
    Date of publication: 2012-09-01
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Real-time upper body tracking with online initialization using a range sensor

     Lopez Mendez, Adolfo; Alcoverro Vidal, Marcel; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    International Conference on Computer Vision
    Presentation's date: 2011-11-07
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We present a novel method for upper body pose estimation with online initialization of pose and the anthropometric profile. Our method is based on a Hierarchical Particle Filter that defines its likelihood function with a single view depth map provided by a range sensor. We use Connected Operators on range data to detect hand and head candidates that are used to enrich the Particle filter’s proposal distribution, but also to perform an automated initialization of the pose and the anthropometric profile estimation. A GPU based implementation of the likelihood evaluation yields real-time performance. Experimental validation of the proposed algorithm and the real-time implementation are provided, as well as a comparison with the recently released OpenNI tracker for the Kinect sensor.

  • A real-time body tracking system for smart rooms

     Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE International Conference on Multimedia and Expo
    Presentation's date: 2011-07-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We present a real-time human body tracking system for a single user in a Smart Room scenario. In this paper we propose a novel system that involves a silhouette-based cost function using variable windows, a hierarchical optimization method, parallel implementations of pixel-based algorithms and efficient usage of a low-cost hardware structure. Results in a Smart Room setup are presented.

  • Joint multi-view foreground segmentation and 3D reconstruction with tolerance loop

     Gallego, Jaime; Salvador, Jordi; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE International Conference on Image Processing
    Presentation's date: 2011-09-12
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Foreground objects segmentation for moving camera scenarios based on SCGMM

     Gallego Vila, Jaime; Pardas Feliu, Montserrat; Solano, Montse
    International Workshop on Computational Intelligence for Multimedia Understanding
    Presentation's date: 2011
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Connected Operators on 3D data for human body analysis

     Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2011
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a novel method for filtering and extraction of human body features from 3D data, either from multi-view images or range sensors. The proposed algorithm consists in processing the geodesic distances on a 3D surface representing the human body in order to find prominent maxima representing salient points of the human body. We introduce a 3D surface graph representation and filtering strategies to enhance robustness to noise and artifacts present in this kind of data. We conduct several experiments on different datasets involving 2 multi-view setups and 2 range data sensors: Kinect and Mesa SR4000. In all of them, the proposed algorithm shows a promising performance towards human body analysis with 3D data.

  • Approximate partitioning of observations in hierarchical particle filter body tracking

     Lopez Mendez, Adolfo; Alcoverro Vidal, Marcel; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2011-06-20
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a model-based hierarchical particle filtering algorithm to estimate the pose and anthropometric parameters of humans in multi-view environments. Our method incorporates a novel likelihood measurement approach consisting of an approximate partitioning of observations. Provided that a partitioning of the human body model has been defined and associates body parts to state space variables, the proposed method estimates image regions that are relevant to that body part and thus to the state space variables of interest. The proposed regions are bounding boxes and consequently can be efficiently processed in a GPU. The algorithm is tested in a challenging dataset involving people playing tennis (TennisSense) and also in the well-known HumanEva dataset. The obtained results show the effectiveness of the proposed method.

  • Work in progress - Cooperative and competitive projects for engaging students in advanced ICT subjects

     Pardas Feliu, Montserrat; Bonafonte Cavez, Antonio Jesus
    Annual Frontiers in Education Conference
    Presentation's date: 2011
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a specific kind of projects that can be used for project-based learning in engineering subjects. The subjects must combine lectures with projects, in order to provide the technical competences together with additional skills such as teamwork learning, oral and written communication skills and application of theory to practice. The projects proposed consist on improving an elemental baseline system. The system is decomposed in modules that correspond to the topics that have been learnt during the lectures. For improving the system, the class is divided in groups and each group has to propose, implement, assess and report a better system. In order to be able to improve the system with a limited amount of time and effort the students need to make a coherent proposal and split the project in several tasks that are usually developed by one or two students. The students within a group cooperate to achieve a better system, but groups compete for the best results. We have already implemented this kind of project in a Speech Processing course and we plan to apply it in a Video Coding course.

  • Real-time user independent hand gesture recognition from time-of-flight camera video using static and dynamic models

     Molina, Javier; Escudero-Viñolo, Marcos; Bescós Cano, Jesús; Signorelo, Alessandro; Pardas Feliu, Montserrat; Ferran, Christian; Marques Acosta, Fernando; Martínez, José Maria
    Machine vision and applications
    Date of publication: 2011-08-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The use of hand gestures offers an alternative to the commonly used human computer interfaces, providing a more intuitive way of navigating among menus and multimedia applications. This paper presents a system for hand gesture recognition devoted to control windows applications. Starting from the images captured by a time-of-flight camera (a camera that produces images with an intensity level inversely proportional to the depth of the objects observed) the system performs hand segmentation as well as a low-level extraction of potentially relevant features which are related to the morphological representation of the hand silhouette. Classification based on these features discriminates between a set of possible static hand postures which results, combined with the estimated motion pattern of the hand, in the recognition of dynamic hand gestures. The whole system works in real-time, allowing practical interaction between user and application.

  • Access to the full text
    Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat; Monte Moreno, Enrique
    Eurasip journal on advances in signal processing
    Date of publication: 2011-11-23
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This article presents a new approach to the problem of simultaneous tracking of several people in low-resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene, being the starting point of the processing chain. We review how the initiation and termination of tracks influences the overall tracker performance, and present a Bayesian approach to efficiently create and destroy tracks. Two Monte Carlo-based schemes adapted to the incoming 3D discrete data are introduced. First, a particle filtering technique is proposed relying on a volume likelihood function taking into account both occupancy and color information. Sparse sampling is presented as an alternative based on a sampling of the surface voxels in order to estimate the centroid of the tracked people. In this case, the likelihood function is based on local neighborhoods computations thus dramatically decreasing the computational load of the algorithm. A discrete 3D re-sampling procedure is introduced to drive these samples along time. Multiple targets are tracked by means of multiple filters, and interaction among them is modeled through a 3D blocking scheme. Tests over CLEAR-annotated database yield quantitative results showing the effectiveness of the proposed algorithms in indoor scenarios, and a fair comparison with other state-of-the-art algorithms is presented. We also consider the real-time performance of the proposed algorithm.

  • Human motion capture using scalable body models

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Computer vision and image understanding
    Date of publication: 2011-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a general analysis framework towards exploiting the underlying hierarchical and scalable structure of an articulated object for pose estimation and tracking. Scalable human body models are introduced as an ordered set of articulated models fulfilling an inclusive hierarchy. The concept of annealing is applied to derive a generic particle filtering scheme able to perform a sequential filtering over the set of models contained in the scalable human body model. Two annealing loops are employed, the standard likelihood annealing and the newly introduced structural annealing, leading to a robust, progressive and efficient analysis of the input data. The validity of this scheme is tested by performing markerless human motion capture in a multi-camera environment employing the standard HumanEva annotated datasets. Finally, quantitative results are presented and compared with other existing HMC techniques.

  • MEDIA AESTHETICS BASED MULTIMEDIA STORYTELLING  Open access

     Obrador Espinosa, Pere
    Defense's date: 2011-07-08
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.

  • Procesado de vídeo multicámara empleando información de la escena: aplicación a eventos deportivos, interacción visual y 3DTV

     Giro Nieto, Xavier; Oliveras Verges, Albert; Gasull Llampallas, Antoni; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Sayrol Clols, Elisa; Pardas Feliu, Montserrat; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier; Vilaplana Besler, Veronica; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • Access to the full text
    Real-time 3D multi-person tracking using Monte Carlo surface sampling  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2010-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The current paper presents a low-complexity approach to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene. The proposed filtering technique estimates the centroid of a target using only a sparse set of points placed on its surface and making this set evolve along time based on the seminal particle filtering principle. In this case, the likelihood function is based on local neighborhoods computations thus drastically decreasing the computational load of the algorithm. In order to handle multiple interacting targets, a separate filter is assigned to each subject in the scenario while a blocking scheme is employed to model their interactions. Tests over a standard annotated dataset yield quantitative results showing the effectiveness of the proposed technique in both accuracy and real-time performance.

  • Access to the full text
    Spatio-temporal alignment and hyperspherical radon transform for 3D gait recognition in multi-view environments  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2010-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a view-invariant approach to gait recognition in multi-camera scenarios exploiting a joint spatio-temporal data representation and analysis. First, multi-view information is employed to generate a 3D voxel reconstruction of the scene under study. The analyzed subject is tracked and its centroid and orientation allow recentering and aligning the volume associated to it, thus obtaining a representation invariant to translation, rotation and scaling. Temporal periodicity of the walking cycle is extracted to align the input data in the time domain. Finally, Hyperspherical Radon Transform is presented as an efficient tool to obtain features from spatio-temporal gait templates for classification purposes. Experimental results prove the validity and robustness of the proposed method for gait recognition tasks with several covariates.

  • Skeleton and shape adjustment and tracking in multicamera environments

     Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Conference on Articulated Motion and Deformable Objects
    Presentation's date: 2010-07-07
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Enhanced bayesian foreground segmentation using brightness and color distortion region-based model for shadow removal

     Gallego, Jaime; Pardas Feliu, Montserrat
    IEEE International Conference on Image Processing
    Presentation's date: 2010-09
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Shape from incomplete silhouettes based on the reprojection error

     Haro Ortega, Gloria; Pardas Feliu, Montserrat
    Image and vision computing
    Date of publication: 2010-09
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Improved 3D reconstruction in smart-room environments using ToF imaging

     Gudmundsson, Sigurjon Arni; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon; Sveinsson, Johannes; Aanaes, Henrik; Larsen, Rasmus
    Computer vision and image understanding
    Date of publication: 2010-12
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Region-based face detection, segmentation and tracking. Framework definition and application to other objects  Open access

     Vilaplana Besler, Veronica
    Defense's date: 2010-12-17
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    One of the central problems in computer vision is the automatic recognition of object classes. In particular, the detection of the class of human faces is a problem that generates special interest due to the large number of applications that require face detection as a first step. In this thesis we approach the problem of face detection as a joint detection and segmentation problem, in order to precisely localize faces with pixel accurate masks. Even though this is our primary goal, in finding a solution we have tried to create a general framework as independent as possible of the type of object being searched. For that purpose, the technique relies on a hierarchical region-based image model, the Binary Partition Tree, where objects are obtained by the union of regions in an image partition. In this work, this model is optimized for the face detection and segmentation tasks. Different merging and stopping criteria are proposed and compared through a large set of experiments. In the proposed system the intra-class variability of faces is managed within a learning framework. The face class is characterized using a set of descriptors measured on the tree nodes, and a set of one-class classifiers. The system is formed by two strong classifiers. First, a cascade of binary classifiers simplifies the search space, and afterwards, an ensemble of more complex classifiers performs the final classification of the tree nodes. The system is extensively tested on different face data sets, producing accurate segmentations and proving to be quite robust to variations in scale, position, orientation, lighting conditions and background complexity. We show that the technique proposed for faces can be easily adapted to detect other object classes. Since the construction of the image model does not depend on any object class, different objects can be detected and segmented using the appropriate object model on the same image model. New object models can be easily built by selecting and training a suitable set of descriptors and classifiers. Finally, a tracking mechanism is proposed. It combines the efficiency of the mean-shift algorithm with the use of regions to track and segment faces through a video sequence, where both the face and the camera may move. The method is extended to deal with other deformable objects, using a region-based graph-cut method for the final object segmentation at each frame. Experiments show that both mean-shift based trackers produce accurate segmentations even in difficult scenarios such as those with similar object and background colors and fast camera and object movements. Lloc i

    Un dels problemes més importants en l'àrea de visió artificial és el reconeixement automàtic de classes d'objectes. En particular, la detecció de la classe de cares humanes és un problema que genera especial interès degut al gran nombre d'aplicacions que requereixen com a primer pas detectar les cares a l'escena. A aquesta tesis s'analitza el problema de detecció de cares com un problema conjunt de detecció i segmentació, per tal de localitzar de manera precisa les cares a l'escena amb màscares que arribin a precisions d'un píxel. Malgrat l'objectiu principal de la tesi és aquest, en el procés de trobar una solució s'ha intentat crear un marc de treball general i tan independent com fos possible del tipus d'objecte que s'està buscant. Amb aquest propòsit, la tècnica proposada fa ús d'un model jeràrquic d'imatge basat en regions, l'arbre binari de particions (BPT: Binary Partition Tree), en el qual els objectes s'obtenen com a unió de regions que provenen d'una partició de la imatge. En aquest treball, s'ha optimitzat el model per a les tasques de detecció i segmentació de cares. Per això, es proposen diferents criteris de fusió i de parada, els quals es comparen en un conjunt ampli d'experiments. En el sistema proposat, la variabilitat dins de la classe cara s'estudia dins d'un marc de treball d'aprenentatge automàtic. La classe cara es caracteritza fent servir un conjunt de descriptors, que es mesuren en els nodes de l'arbre, així com un conjunt de classificadors d'una única classe. El sistema està format per dos classificadors forts. Primer s'utilitza una cascada de classificadors binaris que realitzen una simplificació de l'espai de cerca i, posteriorment, s'aplica un conjunt de classificadors més complexes que produeixen la classificació final dels nodes de l'arbre. El sistema es testeja de manera exhaustiva sobre diferents bases de dades de cares, sobre les quals s'obtenen segmentacions precises provant així la robustesa del sistema en front a variacions d'escala, posició, orientació, condicions d'il·luminació i complexitat del fons de l'escena. A aquesta tesi es mostra també que la tècnica proposada per cares pot ser fàcilment adaptable a la detecció i segmentació d'altres classes d'objectes. Donat que la construcció del model d'imatge no depèn de la classe d'objecte que es pretén buscar, es pot detectar i segmentar diferents classes d'objectes fent servir, sobre el mateix model d'imatge, el model d'objecte apropiat. Nous models d'objecte poden ser fàcilment construïts mitjançant la selecció i l'entrenament d'un conjunt adient de descriptors i classificadors. Finalment, es proposa un mecanisme de seguiment. Aquest mecanisme combina l'eficiència de l'algorisme mean-shift amb l'ús de regions per fer el seguiment i segmentar les cares al llarg d'una seqüència de vídeo a la qual tant la càmera com la cara es poden moure. Aquest mètode s'estén al cas de seguiment d'altres objectes deformables, utilitzant una versió basada en regions de la tècnica de graph-cut per obtenir la segmentació final de l'objecte a cada imatge. Els experiments realitzats mostren que les dues versions del sistema de seguiment basat en l'algorisme mean-shift produeixen segmentacions acurades, fins i tot en entorns complicats com ara quan l'objecte i el fons de l'escena presenten colors similars o quan es produeix un moviment ràpid, ja sigui de la càmera o de l'objecte.

  • Adquisición multicámara para Free Viewpoint Video (MC4FVV)

     Pardas Feliu, Montserrat; Giro Nieto, Xavier; Vilaplana Besler, Veronica; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Gasull Llampallas, Antoni; Oliveras Verges, Albert; Sayrol Clols, Elisa; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • Format-Agnostic SCript-based INterAcTive Experience

     Casas Pla, Josep Ramon; Morros Rubió, Josep Ramon; Marques Acosta, Fernando; Pardas Feliu, Montserrat; Ruiz Hidalgo, Javier
    Participation in a competitive project

     Share

  • Marker-based human motion capture in multi-view sequences

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Eurasip journal on advances in signal processing
    Date of publication: 2010-12
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Skeleton and shape adjustment and tracking in multicamera environments

     Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Lecture notes in computer science
    Date of publication: 2010-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a method for automatic body model adjustment and motion tracking in multicamera environments.We introduce a set of shape deformation parameters based on linear blend skinning, that allow a deformation related to the scaling of the distinct bones of the body model skeleton, and a deformation in the radial direction of a bone. The adjustment of a generic body model to a specific subject is achieved by the estimation of those shape deformation parameters. This estimation combines a local optimization method and hierarchical particle filtering, and uses an efficient cost function based on foreground silhouettes using GPU. This estimation takes into account anthropometric constraints by using a rejection sampling method of propagation of particles. We propose a hierarchical particle filtering method for motion tracking using the adjusted model. We show accurate model adjustment and tracking for distinct subjects in a 5 cameras set up.

  • Model-based hand gesture tracking in ToF image sequences

     Gudmundsson, Sigurjon; Sveinsson, Johannes; Pardas Feliu, Montserrat; Aanaes, Henrik; Larsen, Ramus
    Lecture notes in computer science
    Date of publication: 2010-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a Time-of-Flight (ToF) camera based system for hand motion and gesture tracking. A 27 degree of freedom (DOF) hand model is constructed and fleshed out by ellipsoids. This allows the synthesis of range images of the model through projective geometry. The hand pose is then tracked with a particle filter by statistically measuring the hypothetical pose against the ToF input image; where the inside/outside alignment of the hand pixels and the depth differences serve as classifying metrics. The high DOF tracking problem for the particle filter is addressed by reducing the high dimensionality of the joint angle space to a low dimensional space via Principal Component Analysis (PCA). The basis vectors are learned from a few basic model configurations and the transformations between these poses. This results in a system capable of practical hand tracking in a restricted gesture configuration space.

  • Human motion capture with scalable body models.

     Canton Ferrer, Cristian
    Defense's date: 2009-07-21
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Voxel based annealed particle filtering for markerless 3D articulated motion capture  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    3DTV Conference
    Presentation's date: 2009-05-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a view-independent approach to markerless human motion capture in low resolution sequences from multiple calibrated and synchronized cameras. Redundancy among cameras is exploited to generate a 3D voxelized representation of the scene and a human body model (HBM) is introduced towards analyzing these data. An annealed particle filtering scheme where every particle encodes an instance of the pose of the HBM is employed. Likelihood between particles and input data is performed using occupancy and surface information and kinematic constrains are imposed in the propagation step towards avoiding impossible poses. Test over the HumanEva annotated dataset yield quantitative results showing the effectiveness of the proposed algorithm.

  • Access to the full text
    Bayesian foreground segmentation and tracking using pixel-wise background model and region-based foreground model  Open access

     Gallego Vila, Jaime; Pardas Feliu, Montserrat; Haro Ortega, Gloria
    IEEE International Conference on Image Processing
    Presentation's date: 2009-11
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a segmentation system for monocular video sequences with static camera that aims at foreground/ background separation and tracking. We propose to combine a simple pixel-wise model for the background with a general purpose region based model for the foreground. The background is modeled using one Gaussian per pixel, thus achieving a precise and easy to update model. The foreground is modeled using a Gaussian Mixture Model with feature vectors consisting of the spatial (x, y) and colour (r, g, b) components. The spatial components of this model are updated using the Expectation Maximization algorithm after the classification of each frame. The background model is formulated in the 5 dimensional feature space in order to be able to apply a Maximum A Posteriori framework for the classification. The classification is done using a graph cut algorithm that allows taking into account neighborhood information. The results presented in the paper show the improvement of the system in situations where the foreground objects have similar colors to those of the background.

  • Access to the full text
    3D shape from multi-camera views by error projection minimization  Open access

     Haro Ortega, Gloria; Pardas Feliu, Montserrat
    Workshop on Image Analysis for Multimedia Interactive Services
    Presentation's date: 2009-05-08
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Traditional shape from silhouette methods compute the 3D shape as the intersection of the back-projected silhouettes in the 3D space, the so called visual hull. However, silhouettes that have been obtained with background subtraction techniques often present miss-detection errors (produced by false negatives or occlusions) which produce incomplete 3D shapes. Our approach deals with miss-detections and noise in the silhouettes. We recover the voxel occupancy which describes the 3D shape by minimizing an energy based on an approximation of the error between the shape 2D projections and the silhouettes. The energy also includes regularization and takes into account the visibility of the voxels in each view in order to handle self-occlusions.

  • Visual hull reconstruction algorithms comparison: towards robustness to silhouette errors

     Alcoverro Vidal, Marcel; Pardas Feliu, Montserrat
    International Conference on Computer Vision Theory and Applications
    Presentation's date: 2009-02
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Towards a low cost multi-camera marker based human motion capture system  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE International Conference on Image Processing
    Presentation's date: 2009-11-09
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a low cost real-time alternative to available commercial human motion capture systems. First, a set of distinguishable markers are placed on several human body landmarks and the scene is captured by a number of calibrated and synchronized cameras. In order to establish a physical relation among markers, a human body model (HBM) is defined. Markers are detected on all camera views and delivered as the input of an annealed particle filter scheme where every particle encodes an instance of the pose of the HBM to be estimated. Likelihood between particles and input data is performed through the generalized symmetric epipolar distance and kinematic constrains are enforced in the propagation step towards avoiding impossible poses. Tests over the HumanEva annotated dataset yield quantitative results showing the effectiveness of the proposed algorithm. Results over sequences involving fast and complex motions are also presented.

  • Multi-Person Tracking Strategies Based on Voxel Analysis

     Casas Pla, Josep Ramon; Canton Ferrer, Cristian; Salvador, J; Pardas Feliu, Montserrat
    Date of publication: 2009-01
    Book chapter

     Share Reference managers Reference managers Open in new window

  • Head Orientation Estimation using Particle Filtering in Multiview Scenarios

     Casas Pla, Josep Ramon; Canton Ferrer, Cristian; Pardas Feliu, Montserrat
    Date of publication: 2009-01
    Book chapter

     Share Reference managers Reference managers Open in new window

  • Activity classification

     Nickel, Kai; Pardas Feliu, Montserrat; Stiefelhagen, Rainer; Canton Ferrer, Cristian; Landabaso Diaz, Jose Luis; Casas Pla, Josep Ramon
    Date of publication: 2009-05-31
    Book chapter

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    One of the most basic building blocks for the understanding of human actions and interactions is the accurate detection and tracking of persons in a scene. In constrained scenarios involving at most one subject, or in situations where persons can be confined to a controlled monitoring space or required to wear markers, sensors, or microphones, these tasks can be solved with relative ease. However, when accurate localization and tracking have to be performed in an unobtrusive or discreet fashion, using only distantly placed microphones and cameras, in a variety of natural and uncontrolled scenarios, the challenges posed are much greater. The problems faced by video analysis are those of poor or uneven illumination, low resolution, clutter or occlusion, unclean backgrounds, and multiple moving and uncooperative users that are not always easily distinguishable.

  • Image and video processing tools for HCI

     Canton Ferrer, Cristian; Pardas Feliu, Montserrat; Vilaplana Besler, Veronica
    Date of publication: 2009
    Book chapter

     Share Reference managers Reference managers Open in new window

  • GRUP DE PROECESSAMENT D'IMATGE I VIDEO (GPI)

     Ruiz Hidalgo, Javier; Marques Acosta, Fernando; Oliveras Verges, Albert; Sayrol Clols, Elisa; Pardas Feliu, Montserrat; Morros Rubió, Josep Ramon; Vilaplana Besler, Veronica; Giro Nieto, Xavier; Gasull Llampallas, Antoni; Salembier Clairon, Philippe Jean; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • Trajectory tree as an object-oriented hierarchical representation for video

     Chang Dorea, Camilo; Pardas Feliu, Montserrat; Marques Acosta, Fernando
    IEEE transactions on circuits and systems for video technology
    Date of publication: 2009-04
    Journal article

     Share Reference managers Reference managers Open in new window

  • Multi-person tracking strategies based on voxel analysis

     Canton Ferrer, Cristian; Salvador, J; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Lecture notes in computer science
    Date of publication: 2008-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents two approaches to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Spatial redundancy is exploited to generate a discrete 3D binary representation of the foreground objects in the scene. Color information obtained from a zenithal camera view is added to this 3D information. The first tracking approach implements heuristic association rules between blobs labelled according to spatiotemporal connectivity criteria. Association rules are based on a cost function which considers their placement and color histogram. In the second approach, a particle filtering scheme adapted to the incoming 3D discrete data is proposed. A volume likelihood function and a discrete 3D re-sampling procedure are introduced to evaluate and drive particles. Multiple targets are tracked by means of multiple particle filters and interaction among them is modeled through a 3D blocking scheme. Evaluation over the CLEAR 2007 database yields quantitative results assessing the performance of the proposed algorithm for indoor scenarios.

  • Head Orientation Estimation Using Particle Filtering in Multiview Scenarios

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Lecture notes in computer science
    Date of publication: 2008-06
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Exploiting Structural Hierarchy in Articulated Objects Towards Robust Motion Capture

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Lecture notes in computer science
    Date of publication: 2008-07
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Shape from inconsistent silhouette

     Landabaso, J L; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    Computer vision and image understanding
    Date of publication: 2008-11
    Journal article

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Audiovisual head orientation estimation with particle filtering in multisensor scenarios  Open access

     Canton Ferrer, Cristian; Segura, C; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat; Hernando Pericas, Francisco Javier
    Eurasip journal on advances in signal processing
    Date of publication: 2008-01
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.

  • Multimodal Real-Time Focus of Attention Estimation in SmartRooms

     Canton Ferrer, Cristian; Segura, C; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon; Hernando Pericas, Francisco Javier
    CVPR 2008 Workshop on Human Communicative Behavior Analysis
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Particle Filtering and Sparse Sampling for Multi-Person 3D Tracking

     Canton Ferrer, Cristian; Canton Ferrer, Cristian; Sblendido, R; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE International Conference on Image Processing
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A UNIFIED FRAMEWORK FOR CONSISTENT 2D/3D FOREGROUND OBJECT DETECTION

     Landabaso Diaz, Jose Luis
    Defense's date: 2008-02-05
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window