Casas Pla, Josep Ramon
Total activity: 196
Expertise
Computer vision, Image analisis, Image processing, Television, Video processing
Professional category
University lecturer
Doctoral courses
Doctor Enginyer de Telecomunicació
Research group
GPI - Image and Video Processing Group
Department
Department of Signal Theory and Communications
School
Barcelona School of Telecommunications Engineering (ETSETB)
E-mail
josep.ramon.casasupc.edu
Contact details
UPC directory Open in new window
Orcid
0000-0003-4639-6904 Open in new window
ResearcherID
A-2851-2010 Open in new window
Scopus Author ID
7203049742 Open in new window
Collaborative networks
       
Links of interest
imatge.upc.edu Open in new window

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 196 results
  • Gesture control interface for immersive panoramic displays

     Alcoverro Vidal, Marcel; Suau, Xavier; Morros Rubió, Josep Ramon; López Méndez, Adolfo; Gil, Albert; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Multimedia tools and applications
    Date of publication: 2013-07-25
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we propose a gesture-based interface designed to interact with panoramic scenes. The system combines novel static gestures with a fast hand tracking method. Our proposal is to use static gestures as shortcuts to activate functionalities of the system (i.e. volume up/down, mute, pause, etc.), and hand tracking to freely explore the panoramic video. The overall system is multi-user, and incorporates a user identification module based on face recognition, which is able both to recognize returning users and to add new users online. The system exploits depth data, making it robust to challenging illumination conditions.We show through experimental results the performance of every component of the system compared to the state of the art. We also show the results of a usability study performed with several untrained users.

    Aquest article es pot consultar a: http://link.springer.com/article/10.1007%2Fs11042-013-1605-7

  • Multi-view video representation based on fast Monte Carlo surface reconstruction

     Salvador Marcos, Jordi; Casas Pla, Josep Ramon
    IEEE transactions on image processing
    Date of publication: 2013-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper provides an alternative solution to the costly representation of multi-view video data, which can be used for both rendering and scene analyses. Initially, a new efficient Monte Carlo discrete surface reconstruction method for foreground objects with static background is presented, which outperforms volumetric techniques and is suitable for GPU environments. Some extensions are also presented, which allow a speeding up of the reconstruction by exploiting multi-resolution and temporal correlations. Then, a fast meshing algorithm is applied, which allows interpolating a continuous surface from the discrete reconstructed points. As shown by the experimental results, the original video frames can be approximated with high accuracy by projecting the reconstructed foreground objects onto the original viewpoints. Furthermore, the reconstructed scene can be easily projected onto any desired virtual viewpoint, thus simplifying the design of free-viewpoint video applications. In our experimental results, we show that our techniques for reconstruction and meshing compare favorably with the state-of-the-art, and we also introduce a rule-of-thumb for effective application of the method with a good quality versus representation cost trade-off.

  • Detecting end-effectors on 2.5D data using geometric deformable models: Application to human pose estimation

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Computer vision and image understanding
    Date of publication: 2013-03
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Human body analysis using depth data.

     Suau Cuadros, Xavier
    Defense's date: 2013-12-04
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    L'anàlisi del cos humà és una de les àrees més àmplies del camp de la visió per computador. Els investigadors han posat un gran esforç en el camp de l'anàlisi del cos humà, sobretot durant la darrera dècada, degut als grans avenços tecnològics, tant pel que fa a les càmeres com a la potència de càlcul. L'anàlisi del cos humà engloba varis temes com la detecció i segmentació de persones, el seguiment del moviment del cos, o el reconeixement d'accions. Tot i que els éssers humans duen a terme aquestes tasques d'una manera natural, es converteixen en un difícil problema quan s'ataca des de l'òptica de la visió per computador. Situacions adverses, com poden ser la perspectiva del punt de vista, les oclusions, les condicions d'il·luminació o la variabilitat de comportament entre persones, converteixen l'anàlisi del cos humà en una tasca complicada.En el camp de la visió per computador, l'evolució de la recerca va sovint lligada al progrés tecnològic, tant dels sensors com de la potència de càlcul dels ordinadors. Els mètodes tradicionals d'anàlisi del cos humà estan basats en càmeres de color. Això limita molt els enfocaments, ja que la informació disponible prové únicament de les dades de color. El concepte multivista va suposar salt de qualitat important. En els enfocaments multivista es tenen múltiples càmeres gravant una mateixa escena simultàniament, permetent utilitzar informació 3D gràcies a algorismes de combinació estèreo. El fet de disposar de informació 3D és un punt clau, ja que el cos humà es mou en un espai tri-dimensional. Així doncs, problemes com les oclusions es poden apaivagar si es disposa de informació 3D.L'aparició de les càmeres de profunditat comercials ha suposat un segon salt en el camp de l'anàlisi del cos humà. Mentre els mètodes multivista tradicionals requereixen un muntatge pesat i car, i una calibració precisa de totes les càmeres; les noves càmeres de profunditat ofereixen informació 3D de forma directa amb un sol sensor. Aquestes càmeres es poden instal·lar ràpidament en una gran varietat d'entorns, ampliant enormement l'espectre d'aplicacions, que era molt reduït amb enfocaments multivista. A més a més, com que les càmeres de profunditat estan basades en llum infraroja, no pateixen problemes relacionats amb canvis d'il·luminació.En aquesta tesi, ens centrem en l'estudi de la informació que ofereixen les càmeres de profunditat, i la seva aplicació al problema d'anàlisi del cos humà. Proposem noves vies per descriure les dades de profunditat mitjançant descriptors específics, capaços d'emfatitzar característiques de l'escena que seran útils de cara a una posterior anàlisi del cos humà. Aquests descriptors exploten l'estructura 3D de les dades de profunditat per superar descriptors 3D generalistes o basats en color. També estudiem el problema de detecció de persones, proposant un mètode per detectar caps robust i ràpid. Ampliem aquest mètode per obtenir un algorisme de seguiment de mans que ha estat utilitzat al llarg de la tesi. En la part final del document, ens centrem en l'anàlisi de les mans com a subàrea de l'anàlisi del cos humà. Degut a la recent aparició de les càmeres de profunditat, hi ha una manca de bases de dades públiques. Contribuïm amb una base de dades pensada per la localització de dits i el reconeixement de gestos utilitzant dades de profunditat. Aquesta base de dades és el punt de partida de dues contribucions sobre localització de dits i reconeixement de gestos basades en tècniques de classificació. En aquests mètodes, també explotem les ja mencionades propostes de descriptors per millor adaptar-nos a la naturalesa de les dades de profunditat.

  • Metric learning from poses for temporal clustering of human motion

     Lopez Mendez, Adolfo; Gall, Juergen; Casas Pla, Josep Ramon; van Gool, Luc
    British Machine Vision Conference
    Presentation's date: 2012-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Temporal clustering of human motion into semantically meaningful behaviors is a challenging task. While unsupervised methods do well to some extent, the obtained clusters often lack a semantic interpretation. In this paper, we propose to learn what makes a sequence of human poses different from others such that it should be annotated as an action. To this end, we formulate the problem as weakly supervised temporal clustering for an unknown number of clusters. Weak supervision is attained by learning a metric from the implicit semantic distances derived from already annotated databases. Such a metric contains some low-level semantic information that can be used to effectively segment a human motion sequence into distinct actions or behaviors. The main advantage of our approach is that metrics can be successfully used across datasets, making our method a compelling alternative to unsupervised methods. Experiments on publicly available mocap datasets show the effectiveness of our approach.

  • Can our TV robustly understand human gestures? Real-Time Gesture Localization in Range Data

     López Méndez, Adolfo; Casas Pla, Josep Ramon
    European Conference on Visual Media Production
    Presentation's date: 2012-12-05
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The ¿old¿ remote falls short of requirements when confronted with digital convergence for living room displays. Enriched options to watch, manage and interact with content on large displays demand improved means of interaction. Concurrently, gesture recognition is increasingly present in human-computer interaction for gaming applications. In this paper we propose a gesture localization framework for interactive display of audio-visual content. The proposed framework works with range data captured from a single consumer depth camera. We focus on still gestures because they are generally user friendly (users do not have to make complex and tiring movements) and allow formulating the problem in terms of object localization. Our method is based on random forests, which have shown an excellent performance on classification and regression tasks. In this work, however, we aim at a specific class of localization problems involving highly unbalanced data: positive examples appear during a small fraction of space and time. We study the impact of this natural unbalance on the random forest learning and we propose a framework to robustly detect gestures on range images in real applications. Our experiments with offline data show the effectiveness of our approach. We also present a real-time application where users can control the TV display with a reduced set of still gestures.

    The 'old' remote falls short of requirements when confronted with digital convergence for living room displays. Enriched options to watch, manage and interact with content on large displays demand improved means of interaction. Concurrently, gesture recognition is increasingly present in human-computer interaction for gaming applications. In this paper we propose a gesture localization framework for interactive display of audio-visual content. The proposed framework works with range data captured from a single consumer depth camera. We focus on still gestures because they are generally user friendly (users do not have to make complex and tiring movements) and allow formulating the problem in terms of object localization. Our method is based on random forests, which have shown an excellent performance on classification and regression tasks. In this work, however, we aim at a specific class of localization problems involving highly unbalanced data: positive examples appear during a small fraction of space and time. We study the impact of this natural unbalance on the random forest learning and we propose a framework to robustly detect gestures on range images in real applications. Our experiments with offline data show the effectiveness of our approach. We also present a real-time application where users can control the TV display with a reduced set of still gestures.

    Google Best Student Paper Award CVMP 2012

  • INTAIRACT: Joint hand gesture and fingertip classification for touchless interaction

     Suau Cuadros, Xavier; Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    European Conference on Computer Vision
    Presentation's date: 2012-10-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this demo we present intAIRact, an online hand-based touchless interaction system. Interactions are based on easy-to-learn hand gestures, that combined with translations and rotations render a user friendly and highly configurable system. The main advantage with respect to existing approaches is that we are able to robustly locate and identify fingertips. Hence, we are able to employ a simple but powerful alphabet of gestures not only by determining the number of visible fingers in a gesture, but also which fingers are being observed. To achieve such a system we propose a novel method that jointly infers hand gestures and fingertip locations using a single depth image from a consumer depth camera. Our approach is based on a novel descriptor for depth data, the Oriented Radial Distribution (ORD) [1]. On the one hand, we exploit the ORD for robust classification of hand gestures by means of efficient k-NN retrieval. On the other hand, maxima of the ORD are used to perform structured inference of fingertip locations. The proposed method outperforms other state-of-the-art approaches both in gesture recognition and fingertip localization. An implementation of the ORD extraction on a GPU yields a real-time demo running at approximately 17fps on a single laptop

  • Access to the full text
    Oriented radial distribution on depth data: application to the detection of end-effectors  Open access

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    IEEE International Conference on Acoustics, Speech, and Signal Processing
    Presentation's date: 2012-03-27
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    End-effectors are considered to be the main topological extremities of a given 3D body. Even if the nature of such body is not restricted, this paper focuses on the human body case. Detection of human extremities is a key issue in the human motion capture domain, being needed to initialize and update the tracker. Therefore, the effectiveness of human motion capture systems usually depends on the reliability of the obtained end-effectors. The increasing accuracy, low cost and easy installation of depth cameras has opened the door to new strategies to overcome the body pose estimation problem. With the objective of detecting the head, hands and feet of a human body, we propose a new local feature computed from depth data, which gives an idea of its curvature and prominence. Such feature is weighted depending on recent detections, providing also a temporal dimension. Based on this feature, some end-effector candidate blobs are obtained and classified into head, hands and feet according to three probabilistic descriptors.

  • Multi-view body tracking with a detector-driven hierarchical particle filter

     Navarro, S.; Lopez Mendez, Adolfo; Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon
    Conference on Articulated Motion and Deformable Objects
    Presentation's date: 2012
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a novel approach to markerless human motion capture that robustly integrates body part detections in multiple views. The proposed method fuses cues from multiple views to enhance the propagation and observation model of particle filtering methods aiming at human motion capture. We particularize our method to improve arm tracking in the publicly available IXMAS dataset. Our experiments show that the proposed method outperforms other state-ofthe- art approaches.

  • Real-time head and hand tracking based on 2.5D data

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    IEEE transactions on multimedia
    Date of publication: 2012-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.

  • Model-based recognition of human actions by trajectory matching in phase spaces

     Lopez Mendez, Adolfo; Casas Pla, Josep Ramon
    Image and vision computing
    Date of publication: 2012-11
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Articulated Models for Human Motion Analysis  Open access

     Lopez Mendez, Adolfo
    Defense's date: 2012-12-10
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Human motion analysis is as a broad area of computer vision that has strongly attracted the interest of researchers in the last decades. Motion analysis covers topics such as human motion tracking and estimation, action and behavior recognition or segmentation of human motion. All these fields are challenging due to different reasons, but mostly because of viewing perspectives, clutter and the imprecise semantics of actions and human motion. The computer vision community has addressed human motion analysis from several perspectives. Earlier approaches often relied on articulated human body models represented in the three-dimensional world. However, due to the traditionally high difficulty and cost of estimating such an articulated structure from video, research has focus on the development of human motion analysis approaches relying on low-level features. Although obtaining impressive results in several tasks, low-level features are typically conditioned by appearance and viewpoint, thus making difficult their application on different scenarios. Nonetheless, the increase in computational power, the massive availability of data and the irruption of consumer-depth cameras is changing the scenario, and with that change human motion analysis through articulated models can be reconsidered. Analyzing and understanding of human motion through 3-dimensional information is still a crucial issue in order to obtain richer models of dynamics and behavior. In that sense, articulated models of the human body offer a compact and view-invariant representation of motion that can be used to leverage motion analysis. In this dissertation, we present several approaches for motion analysis. In particular, we address the problem of pose inference, action recognition and temporal clustering of human motion. Articulated models are the leitmotiv in all the presented approaches. Firstly, we address pose inference by formulating a layered analysis-by-synthesis framework where models are used to generate hypothesis that are matched against video. Based on the same articulated representation upon which models are built, we propose an action recognition framework. Actions are seen as time-series observed through the articulated model and generated by underlying dynamical systems that we hypothesize that are generating the time-series. Such an hypothesis is used in order to develop recognition methods based on time-delay embeddings, which are analysis tools that do not make assumptions on the form of the form of the underlying dynamical system. Finally, we propose a method to cluster human motion sequences into distinct behaviors, without a priori knowledge of the number of actions in the sequence. Our approach relies on the articulated model representation in order to learn a distance metric from pose data. This metric aims at capturing semantics from labeled data in order to cluster unseen motion sequences into meaningful behaviors. The proposed approaches are evaluated using publicly available datasets in order to objectively measure our contributions.

    L’anàlisi del moviment humà es una area de visió per computador que, en les últimes dècades, ha atret l'interès de la comunitat científica. L’anàlisi de moviment inclou temes com el seguiment del cos humà, el reconeixement d'accions i patrons de comportament, o la segmentació del moviment humà. Tots aquests camps suposen un repte a causa de diferents raons, però especialment a la perspectiva de captura de les escenes a analitzar i també a l’absència d'una semàntica precisa associada a les accions i el moviment humà. La comunitat de visió per computador ha abordat l’anàlisi del moviment humà des de diverses perspectives. Els primers enfocaments es basen en models articulats del cos humà. Aquests models representen el cos com una estructura esqueletal tridimensional. No obstant, a causa de la dificultat i el cost computacional de l’estimació d'aquesta estructura articulada a partir de vídeo, la investigació s'ha anat enfocant, en els últims anys, cap a l’anàlisi de moviment humà basat en característiques de baix nivell. Malgrat obtenir resultats impressionants en diverses tasques, les característiques de baix nivell estan normalment condicionades per l’aparença i punt de vista, cosa que fa difícil la seva aplicació en diferents escenaris. Avui dia, l'augment de la potència de càlcul, la disponibilitat massiva de dades i la irrupció de les càmares de profunditat de baix cost han proporcionat un escenari que permet reconsiderar l’anàlisi de moviment humà a través de models articulats. L'anàlisi i comprensió del moviment humà a través de la informació tridimensional segueix sent un enfocament crucial per obtenir millors models dinàmics al voltant del moviment del cos humà. Per això, els models articulats del cos humà, que ofereixen una representació compacta i invariant al punt de vista de la captura, són una eina per potenciar l'anàlisi de moviment. En aquesta tesi, es presenten diversos enfocaments per a l'anàlisi de moviment. En particular, s'aborda el problema de l'estimació de pose, el reconeixement d'accions i el clustering temporal del moviment humà. Els models articulats són el leitmotiv en tots els plantejaments presentats. En primer lloc, plantegem l’estimació de pose mitjançant la formulació d'un mètode jeràrquic d'anàlisi per síntesi en que els models s'utilitzen per generar hipòtesis que es contrasten amb vídeo. Fent servir la mateixa representació articulada del cos humà, es proposa una formulació del moviment humà per al reconeixement d'accions. La nostra hipòtesi és que les accions formen un conjunt de sistemes dinàmics subjacents que generen observacions en forma de sèries temporals. Aquestes sèries temporals són observades a través del model articulat. Aquesta hipòtesi s'utilitza amb la finalitat de desenvolupar mètodes de reconeixement basats en time-delay embeddings, una eina d’anàlisi de sèries temporals que no fa suposicions sobre la forma del sistema dinàmic subjacent. Finalment, es proposa un mètode per segmentar seqüències de moviment del cos humà en diferents comportaments o accions, sense necessitar un coneixement a priori del nombre d'accions en la seqüència. El nostre enfocament utilitza els models articulats del cos humà per aprendre una distància mètrica. Aquesta mètrica té com a objectiu capturar la semàntica implícita de les anotacions que es puguin trobar en altres bases de dades que continguin seqüències de moviment. Amb la finalitat de mesurar objectivament les nostres contribucions, els mètodes proposats són avaluats utilitzant bases de dades publiques.

  • Part-based Object Retrieval with Binary Partition Trees  Open access

     Giro Nieto, Xavier
    Defense's date: 2012-05-31
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This thesis addresses the problem of visual object retrieval, where a user formulates a query to an image database by providing one or multiple examples of an object of interest. The presented techniques aim both at finding those images in the database that contain the object as well as locating the object in the image and segmenting it from the background. Every considered image, both the ones used as queries and the ones contained in the target database, is represented as a Binary Partition Tree (BPT), the hierarchy of regions previously proposed by Salembier and Garrido (2000). This data structure offers multiple opportunities and challenges when applied to the object retrieval problem. A first application of BPTs appears during the formulation of the query, when the user must interactively segment the query object from the background. Firstly, the BPT can assist in adjusting an initial marker, such as a scribble or bounding box, to the object contours. Secondly, BPT can also define a navigation path for the user to adjust an initial selection to the appropriate spatial scale. The hierarchical structure of the BPT is also exploited to extract a new type of visual words named Hierarchical Bag of Regions (HBoR). Each region defined in the BPT is described with a feature vector that combines a soft quantization on a visual codebook with an efficient bottom-up computation through the BPT. These descriptors allow the definition of a novel feature space, the Parts Space, where each object is located according to the parts that compose it. HBoR descriptors have been applied to two scenarios for object retrieval, both of them solved by considering the decomposition of the objects in parts. In the first scenario, the query is formulated with a single object exemplar which is to be matched with each BPT in the target database. The matching problem is solved in two stages: an initial top-down one that assumes that the hierarchy from the query is respected in the target BPT, and a second bottom-up one that relaxes this condition and considers region merges which are not in the target BPT. The second scenario where HBoR descriptors are applied considers a query composed of several visual objects. In this case, the provided exemplars are considered as a training set to build a model of the query concept. This model is composed of two levels, a first one where each part is modelled and detected separately, and a second one that characterises the combinations of parts that describe the complete object. The analysis process exploits the hierarchical nature of the BPT by using a novel classifier that drives an efficient top-down analysis of the target BPTs.

  • Google Best Paper Award CVMP

     López Méndez, Adolfo; Casas Pla, Josep Ramon
    Award or recognition

    View View Open in new window  Share

  • End user, production and hardware and network requirements. FascinatE deliverable D1.1.2

     Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon; Suau Cuadros, Xavier; Gibb, Andrew; Prins, M.J.; Zoric, G.; Engström, A.; Perry, M.; Önnevall, E.; Juhlin, O.; Hannerfors, P.; Macq, Jean François; Schreer, Oliver
    Date: 2012-02-14
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Real-time upper body tracking with online initialization using a range sensor

     Lopez Mendez, Adolfo; Alcoverro Vidal, Marcel; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    International Conference on Computer Vision
    Presentation's date: 2011-11-07
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We present a novel method for upper body pose estimation with online initialization of pose and the anthropometric profile. Our method is based on a Hierarchical Particle Filter that defines its likelihood function with a single view depth map provided by a range sensor. We use Connected Operators on range data to detect hand and head candidates that are used to enrich the Particle filter’s proposal distribution, but also to perform an automated initialization of the pose and the anthropometric profile estimation. A GPU based implementation of the likelihood evaluation yields real-time performance. Experimental validation of the proposed algorithm and the real-time implementation are provided, as well as a comparison with the recently released OpenNI tracker for the Kinect sensor.

  • A real-time body tracking system for smart rooms

     Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE International Conference on Multimedia and Expo
    Presentation's date: 2011-07-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We present a real-time human body tracking system for a single user in a Smart Room scenario. In this paper we propose a novel system that involves a silhouette-based cost function using variable windows, a hierarchical optimization method, parallel implementations of pixel-based algorithms and efficient usage of a low-cost hardware structure. Results in a Smart Room setup are presented.

  • Joint multi-view foreground segmentation and 3D reconstruction with tolerance loop

     Gallego, Jaime; Salvador, Jordi; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE International Conference on Image Processing
    Presentation's date: 2011-09-12
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Real-time head and hand tracking based on 2.5D data

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Multimedia and Expo
    Presentation's date: 2011-07
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on 2.5D data from a range camera, which is exploited to resolve ambiguities and overlaps. Experimental results show high robustness against partial occlusions and fast movements. The estimated positions are fairly stable, allowing the extraction of accurate trajectories which may be used for gesture classification purposes.

  • A compact 3D representation for multi-view video  awarded activity

     Salvador Marcos, Jordi; Casas Pla, Josep Ramon
    International Conference on 3D Imaging
    Presentation's date: 2011-12-07
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a methodology for obtaining a 3D reconstruction of a dynamic scene in multi-camera settings. Our target is to derive a compact representation of the 3D scene which is effective and accurate, whatever the number of cameras and even for very-wide baseline settings. Easing realtime 3D scene capture has outstanding applications in 2D and 3D content production, free viewpoint video of natural scenes and interactive video applications. The method proposed here has several original contributions on how to accelerate the process: it exploits spatial and temporal consistency for speeding up reconstruction, dividing the problem in two parts. First, 3D surfaces are efficiently sampled to obtain a silhouette-consistent set of colored surface points and normals, using a novel algorithm presented in this paper. Then, a fast, greedy meshing algorithm retrieves topologically correct continuous surfaces from the dense sets of oriented points, providing a suitable representation for multi-view video. Compared to other techniques in the literature, the presented approach is capable of retrieving 3D surfaces of foreground objects in real-time by exploiting the computing capabilities of GPUs. This is feasible due to the parallelized design of the surface sampling algorithm. The reconstructed surfaces can effectively be used for interactive representations. The presented methodology also offers good scalability to large multi-view video settings

    "Best Scientific Paper Award" atorgat per 3D Stereo Media

  • Connected Operators on 3D data for human body analysis

     Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2011
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a novel method for filtering and extraction of human body features from 3D data, either from multi-view images or range sensors. The proposed algorithm consists in processing the geodesic distances on a 3D surface representing the human body in order to find prominent maxima representing salient points of the human body. We introduce a 3D surface graph representation and filtering strategies to enhance robustness to noise and artifacts present in this kind of data. We conduct several experiments on different datasets involving 2 multi-view setups and 2 range data sensors: Kinect and Mesa SR4000. In all of them, the proposed algorithm shows a promising performance towards human body analysis with 3D data.

  • Approximate partitioning of observations in hierarchical particle filter body tracking

     Lopez Mendez, Adolfo; Alcoverro Vidal, Marcel; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2011-06-20
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a model-based hierarchical particle filtering algorithm to estimate the pose and anthropometric parameters of humans in multi-view environments. Our method incorporates a novel likelihood measurement approach consisting of an approximate partitioning of observations. Provided that a partitioning of the human body model has been defined and associates body parts to state space variables, the proposed method estimates image regions that are relevant to that body part and thus to the state space variables of interest. The proposed regions are bounding boxes and consequently can be efficiently processed in a GPU. The algorithm is tested in a challenging dataset involving people playing tennis (TennisSense) and also in the well-known HumanEva dataset. The obtained results show the effectiveness of the proposed method.

  • Access to the full text
    Acoustic event detection based on feature-level fusion of audio and video modalities  Open access

     Butko, Taras; Canton Ferrer, Cristian; Segura, Carlos; Giro Nieto, Xavier; Nadeu Camprubí, Climent; Hernando Pericas, Francisco Javier; Casas Pla, Josep Ramon
    Eurasip journal on advances in signal processing
    Date of publication: 2011-03-15
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.

  • Access to the full text
    Multi-camera multi-object voxel-based Monte Carlo 3D tracking strategies  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat; Monte Moreno, Enrique
    Eurasip journal on advances in signal processing
    Date of publication: 2011-11-23
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This article presents a new approach to the problem of simultaneous tracking of several people in low-resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene, being the starting point of the processing chain. We review how the initiation and termination of tracks influences the overall tracker performance, and present a Bayesian approach to efficiently create and destroy tracks. Two Monte Carlo-based schemes adapted to the incoming 3D discrete data are introduced. First, a particle filtering technique is proposed relying on a volume likelihood function taking into account both occupancy and color information. Sparse sampling is presented as an alternative based on a sampling of the surface voxels in order to estimate the centroid of the tracked people. In this case, the likelihood function is based on local neighborhoods computations thus dramatically decreasing the computational load of the algorithm. A discrete 3D re-sampling procedure is introduced to drive these samples along time. Multiple targets are tracked by means of multiple filters, and interaction among them is modeled through a 3D blocking scheme. Tests over CLEAR-annotated database yield quantitative results showing the effectiveness of the proposed algorithms in indoor scenarios, and a fair comparison with other state-of-the-art algorithms is presented. We also consider the real-time performance of the proposed algorithm.

  • Human motion capture using scalable body models

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Computer vision and image understanding
    Date of publication: 2011-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a general analysis framework towards exploiting the underlying hierarchical and scalable structure of an articulated object for pose estimation and tracking. Scalable human body models are introduced as an ordered set of articulated models fulfilling an inclusive hierarchy. The concept of annealing is applied to derive a generic particle filtering scheme able to perform a sequential filtering over the set of models contained in the scalable human body model. Two annealing loops are employed, the standard likelihood annealing and the newly introduced structural annealing, leading to a robust, progressive and efficient analysis of the input data. The validity of this scheme is tested by performing markerless human motion capture in a multi-camera environment employing the standard HumanEva annotated datasets. Finally, quantitative results are presented and compared with other existing HMC techniques.

  • MEDIA AESTHETICS BASED MULTIMEDIA STORYTELLING  Open access

     Obrador Espinosa, Pere
    Defense's date: 2011-07-08
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.

  • SURFACE RECONSTRUCTION FOR MULTI-VIEW VIDEO  Open access

     Salvador Marcos, Jordi
    Defense's date: 2011-09-23
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This thesis introduces a methodology for obtaining an alternative representation of video sequences captured by calibrated multi-camera systems in controlled environments with known scene background. This representation consists in a 3D description of the surfaces of foreground objects, which allows for the recovering of part of the 3D information of the original scene lost in the projection process in each camera. The choice of the type of representation and the design of the reconstruction techniques are driven by three requirements that appear in smart rooms or recording studios. In these scenarios, video sequences captured by a multi-camera rig are used both for analysis applications and interactive visualization methods. The requirements are: the reconstruction method must be fast in order to be usable in interactive applications, the surface representation must provide a compression of the multi-view data redundancies and this representation must also provide all the relevant information to be used for analysis applications as well as for free-viewpoint video. Once foreground and background are segregated for each view, the reconstruction process is divided in two stages. The first one obtains a sampling of the foreground surfaces (including orientation and texture), whereas the second provides closed, continuous surfaces from the samples, through interpolation. The sampling process is interpreted as a search for 3D positions that result in feature matchings between different views. This search process can be driven by different mechanisms: an image-based approach, another one based on the deformation of a surface from frame to frame or a statistical sampling approach where samples are searched around the positions of other detected samples, which is the fastest and easiest to parallelize of the three approaches. A meshing algorithm is also presented, which allows for the interpolation of surfaces between samples. Starting by an initial triangle, which connects three points coherently oriented, an iterative expansion of the surface over the complete set of samples takes place. The proposed method presents a very accurate reconstruction and results in a correct topology. Furthermore, it is fast enough to be used interactively. The presented methodology for surface reconstruction permits obtaining a fast, compressed and complete representation of foreground elements in multi-view video, as reflected by the experimental results.

    Aquesta tesi presenta diferents tècniques per a la definiciò d’una metodologia per obtenir una representaciò alternativa de les seqüències de vídeo capturades per sistemes multi-càmera calibrats en entorns controlats, amb fons de l’escena conegut. Com el títol de la tesi suggereix, aquesta representació consisteix en una descripció tridimensional de les superfícies dels objectes de primer pla. Aquesta aproximació per la representació de les dades multi-vista permet recuperar part de la informació tridimensional de l’escena original perduda en el procés de projecció que fa cada càmera. L’elecció del tipus de representació i el disseny de les tècniques per la reconstrucció de l’escena responen a tres requeriments que apareixen en entorns controlats del tipus smart room o estudis de gravació, en què les seqüències capturades pel sistema multi-càmera són utilitzades tant per aplicacions d’anàlisi com per diferents mètodes de visualització interactius. El primer requeriment és que el mètode de reconstrucció ha de ser ràpid, per tal de poder-ho utilitzar en aplicacions interactives. El segon és que la representació de les superfícies sigui eficient, de manera que en resulti una compressió de les dades multi-vista. El tercer requeriment és que aquesta representació sigui efectiva, és a dir, que pugui ser utilitzada en aplicacions d’anàlisi, així com per visualitació. Un cop separats els continguts de primer pla i de fons de cada vista –possible en entorns controlats amb fons conegut–, l’estratègia que es segueix en el desenvolupament de la tesi és la de dividir el procés de reconstrucció en dues etapes. La primera consisteix en obtenir un mostreig de les superfícies (incloent orientació i textura). La segona etapa proporciona superfícies tancades, contínues, a partir del conjunt de mostres, mitjançant un procés d’interpolació. El resultat de la primera etapa és un conjunt de punts orientats a l’espai 3D que representen localment la posició, orientació i textura de les superfícies visibles pel conjunt de càmeres. El procés de mostreig s’interpreta com un procés de cerca de posicions 3D que resulten en correspondències de característiques de la imatge entre diferents vistes. Aquest procés de cerca pot ser conduït mitjançant diferents mecanismes, els quals es presenten a la primera part d’aquesta tesi. La primera proposta és fer servir un mètode basat en les imatges que busca mostres de superfície al llarg de la semi-recta que comença al centre de projeccions de cada càmera i passa per un determinat punt de la imatge corresponent. Aquest mètode s’adapta correctament al cas de voler explotar foto-consistència en un escenari estàtic i presenta caracterìstiques favorables per la seva utilizació en GPUs–desitjable–, però no està orientat a explotar les redundàncies temporals existentsen seqüències multi-vista ni proporciona superfícies tancades. El segon mètode efectua la cerca a partir d’una superfície inicial mostrejada que tanca l’espai on es troben els objectes a reconstruir. La cerca en direcció inversa a les normals –apuntant a l’interior– permet obtenir superfícies tancades amb un algorisme que explota la correlació temporal de l’escena per a l’evolució de reconstruccions 3D successives al llarg del temps. Un inconvenient d’aquest mètode és el conjunt d’operacions topològiques sobre la superfície inicial, que en general no són aplicables eficientment en GPUs. La tercera estratègia de mostreig està orientada a la paral·lelització –GPU– i l’explotació de correlacions temporals i espacials en la cerca de mostres de superfície. Definint un espai inicial de cerca que inclou els objectes a reconstruir, es busquen aleatòriament unes quantes mostres llavor sobre la superfície dels objectes. A continuació, es continuen buscant noves mostres de superfície al voltant de cada llavor –procés d’expansió– fins que s’aconsegueix una densitat suficient. Per tal de millorar l’eficiència de la cerca inicial de llavors, es proposa reduir l’espai de cerca, explotant d’una banda correlacions temporals en seqüències multi-vista i de l’altra aplicant multi-resolució. A continuació es procedeix amb l’expansió, que explota la correlació espacial en la distribució de les mostres de superfície. A la segona part de la tesi es presenta un algorisme de mallat que permet interpolar la superfície entre les mostres. A partir d’un triangle inicial, que connecta tres punts coherentment orientats, es procedeix a una expansió iterativa de la superfície sobre el conjunt complet de mostres. En relació amb l’estat de l’art, el mètode proposat presenta una reconstrucció molt precisa (no modifica la posició de les mostres) i resulta en una topologia correcta. A més, és prou ràpid com per ser utilitzable en aplicacions interactives, a diferència de la majoria de mètodes disponibles. Els resultats finals, aplicant ambdues etapes –mostreig i interpolació–, demostren la validesa de la proposta. Les dades experimentals mostren com la metodologia presentada permet obtenir una representació ràpida, eficient –compressió– i efectiva –completa– dels elements de primer pla de l’escena.

  • Procesado de vídeo multicámara empleando información de la escena: aplicación a eventos deportivos, interacción visual y 3DTV

     Giro Nieto, Xavier; Oliveras Verges, Albert; Gasull Llampallas, Antoni; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Sayrol Clols, Elisa; Pardas Feliu, Montserrat; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier; Vilaplana Besler, Veronica; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • Best Paper Award International Conference on 3D Imaging

     Salvador Marcos, Jordi; Casas Pla, Josep Ramon
    Award or recognition

    View View Open in new window  Share

  • Access to the full text
    Real-time 3D multi-person tracking using Monte Carlo surface sampling  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2010-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The current paper presents a low-complexity approach to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene. The proposed filtering technique estimates the centroid of a target using only a sparse set of points placed on its surface and making this set evolve along time based on the seminal particle filtering principle. In this case, the likelihood function is based on local neighborhoods computations thus drastically decreasing the computational load of the algorithm. In order to handle multiple interacting targets, a separate filter is assigned to each subject in the scenario while a blocking scheme is employed to model their interactions. Tests over a standard annotated dataset yield quantitative results showing the effectiveness of the proposed technique in both accuracy and real-time performance.

  • Access to the full text
    Spatio-temporal alignment and hyperspherical radon transform for 3D gait recognition in multi-view environments  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    IEEE Conference on Computer Vision and Pattern Recognition
    Presentation's date: 2010-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a view-invariant approach to gait recognition in multi-camera scenarios exploiting a joint spatio-temporal data representation and analysis. First, multi-view information is employed to generate a 3D voxel reconstruction of the scene under study. The analyzed subject is tracked and its centroid and orientation allow recentering and aligning the volume associated to it, thus obtaining a representation invariant to translation, rotation and scaling. Temporal periodicity of the walking cycle is extracted to align the input data in the time domain. Finally, Hyperspherical Radon Transform is presented as an efficient tool to obtain features from spatio-temporal gait templates for classification purposes. Experimental results prove the validity and robustness of the proposed method for gait recognition tasks with several covariates.

  • Skeleton and shape adjustment and tracking in multicamera environments

     Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Conference on Articulated Motion and Deformable Objects
    Presentation's date: 2010-07-07
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Joint estimation of shape and motion from silhouettes

     Salvador Marcos, Jordi; Casas Pla, Josep Ramon
    IEEE International Conference on Image Processing
    Presentation's date: 2010
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Surface reconstruction by restricted and oriented propagation

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Image Processing
    Presentation's date: 2010-09-27
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Virtual view appearance representation for human motion analysis in multi-view environments

     Lopez Mendez, Adolfo; Canton Ferrer, Cristian; Casas Pla, Josep Ramon
    European Signal Processing Conference
    Presentation's date: 2010-08-25
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We propose a view-invariant representation of human appearance in multi-view scenarios consisting in a new set of views that overcome the view-dependency and moderate occlusion problems of fixed cameras. First, a 3D reconstruction of the scene is generated, from which we can track multiple persons in the scenario. For each tracked subject, we define a set of virtual views by projecting its associated 3D volume. The synthetic views can be generated in convenient directions to detect and classify a number of gestures useful in assistive and smart environments. Experimental results of the representation and event detection in a multi-camera environment prove the effectiveness of the proposed method.

  • From silhouettes to 3D points to mesh: towards free viewpoint video

     Salvador Marcos, Jordi; Suau Cuadros, Xavier; Casas Pla, Josep Ramon
    ACM Workshop on 3D Video Processing (3DVP)
    Presentation's date: 2010-10-29
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Photo-consistent surfaces from a sparse set of viewpoints

     Salvador Marcos, Jordi; Casas Pla, Josep Ramon
    IEEE International Conference on Image Processing
    Presentation's date: 2010-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We address the problem of reconstructing 3D shapes from color data available in a sparse set of views from all directions of a scene. As an advantage when compared to multiview stereo approaches, our method is able to reconstruct object surfaces from a small number of views in wide-baseline setups. This introduces a trade-off between reconstruction accuracy and spatial coverage. The proposed algorithm obtains candidate surface points using a photo-consistency test and restricting the analysis to foreground pixels. The final surface points are extracted by iteratively carving away candidate points that are not photo-consistent with the complete multiview set. Finally, a surface patch is obtained from each colored surface point by estimating its orientation. Experimental results reflect the validity of the approach by comparing it to a voxelized implementation.

  • Improved 3D reconstruction in smart-room environments using ToF imaging

     Gudmundsson, Sigurjon Arni; Pardas Feliu, Montserrat; Casas Pla, Josep Ramon; Sveinsson, Johannes; Aanaes, Henrik; Larsen, Rasmus
    Computer vision and image understanding
    Date of publication: 2010-12
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Computers in the human interaction loop

     Waibel, Alex; Stiefelhagen, Rainer; Carlson, Rolf; Casas Pla, Josep Ramon; Kleindienst, Jan; Lamel, Lori; Lanz, Oswald; Mostefa, Djamel; Omologo, Maurizio; Pianesi, Fabio; Polymenakos, Lazaros; Potamianos, Gerasimos; Soldatos, John; Sutschet, Gerard; Terken, Jacques
    Date of publication: 2010
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Information theoretical region merging approaches and fusion of hierarchical image segmentation results

     Calderero Patino, Felipe
    Defense's date: 2010-02-12
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Adquisición multicámara para Free Viewpoint Video (MC4FVV)

     Pardas Feliu, Montserrat; Giro Nieto, Xavier; Vilaplana Besler, Veronica; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Gasull Llampallas, Antoni; Oliveras Verges, Albert; Sayrol Clols, Elisa; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • Best student paper award ICIP 2010

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    Award or recognition

     Share

  • FascinatE D1.1.1 End user, production and hardware and networking requirements

     Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon; Suau Cuadros, Xavier; Gibb, Andrew; Niamut, O.A.; Prins, M.J.; Zoric, G.; Engström, A.; Perry, M.; Önnevall, E.; Juhlin, O.; Macq, Jean François
    Date: 2010-09-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Format-Agnostic SCript-based INterAcTive Experience

     Casas Pla, Josep Ramon; Morros Rubió, Josep Ramon; Marques Acosta, Fernando; Pardas Feliu, Montserrat; Ruiz Hidalgo, Javier
    Participation in a competitive project

     Share

  • Marker-based human motion capture in multi-view sequences

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Eurasip journal on advances in signal processing
    Date of publication: 2010-12
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Skeleton and shape adjustment and tracking in multicamera environments

     Alcoverro Vidal, Marcel; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Lecture notes in computer science
    Date of publication: 2010-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present a method for automatic body model adjustment and motion tracking in multicamera environments.We introduce a set of shape deformation parameters based on linear blend skinning, that allow a deformation related to the scaling of the distinct bones of the body model skeleton, and a deformation in the radial direction of a bone. The adjustment of a generic body model to a specific subject is achieved by the estimation of those shape deformation parameters. This estimation combines a local optimization method and hierarchical particle filtering, and uses an efficient cost function based on foreground silhouettes using GPU. This estimation takes into account anthropometric constraints by using a rejection sampling method of propagation of particles. We propose a hierarchical particle filtering method for motion tracking using the adjusted model. We show accurate model adjustment and tracking for distinct subjects in a 5 cameras set up.

  • Access to the full text
    Integration of audiovisual sensors and technologies in a smart room  Open access

     Neumann, J; Casas Pla, Josep Ramon; Macho, D; Ruiz Hidalgo, Javier
    Personal and ubiquitous computing
    Date of publication: 2009-01
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    At the Technical University of Catalonia (UPC), a smart room has been equipped with 85 microphones and 8 cameras. This paper describes the setup of the sensors, gives an overview of the underlying hardware and software infrastructure and indicates possibilities for highand low-level multi-modal interaction. An example of usage of the information collected from the distributed sensor network is explained in detail: the system supports a group of students that have to solve a lab assignment related problem.

  • Human motion capture with scalable body models.

     Canton Ferrer, Cristian
    Defense's date: 2009-07-21
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Multi-resolution illumination compensation for foreground extraction  Open access  awarded activity

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Image Processing
    Presentation's date: 2009-11-10
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Illumination changes may lead to false foreground (FG) segmentation and tracking results. Most of the existing FG extraction algorithms obtain a background (BG) estimation from temporal statistical parameters. Such algorithms consider a quasi-static BG which does not change but slowly. Therefore, fast illumination changes are not taken into account by the BG estimator and they are considered as FG. The aim of the proposed algorithm is to reduce illumination effects in video sequences in order to improve foreground segmentation performances.

  • Access to the full text
    Voxel based annealed particle filtering for markerless 3D articulated motion capture  Open access

     Canton Ferrer, Cristian; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    3DTV Conference
    Presentation's date: 2009-05-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a view-independent approach to markerless human motion capture in low resolution sequences from multiple calibrated and synchronized cameras. Redundancy among cameras is exploited to generate a 3D voxelized representation of the scene and a human body model (HBM) is introduced towards analyzing these data. An annealed particle filtering scheme where every particle encodes an instance of the pose of the HBM is employed. Likelihood between particles and input data is performed using occupancy and surface information and kinematic constrains are imposed in the propagation step towards avoiding impossible poses. Test over the HumanEva annotated dataset yield quantitative results showing the effectiveness of the proposed algorithm.