Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 79 results
  • Interactive rendering

     Ruiz Hidalgo, Javier; Borsum, Malte; Kochale, Axel; Zoric, Goranka
    Date of publication: 2014-01-01
    Book chapter

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Presents current trends and potential future developments by leading researchers in immersive media production, delivery, rendering and interaction The underlying audio and video processing technology that is discussed in the book relates to areas such as 3D object extraction, audio event detection; 3D sound rendering and face detection, gesture analysis and tracking using video and depth information. The book will give an insight into current trends and developments of future media production, delivery and reproduction. Consideration of the complete production, processing and distribution chain will allow for a full picture to be presented to the reader. Production developments covered will include integrated workflows developed by researchers and industry practitioners as well as capture of ultra-high resolution panoramic video and 3D object based audio across a range of programme genres. Distribution developments will include script based format agnostic network delivery to a full range of devices from large scale public panoramic displays with wavefield synthesis and ambisonic audio reproduction to 'small screen' mobile devices. Key developments at the consumer end of the chain apply to both passive and interactive viewing modes and will incorporate user interfaces such as gesture recognition and 'second screen' devices to allow manipulation of the audio visual content.

  • Media production, delivery and interaction for platform independent systems

    Date of publication: 2014-01-01
    Book

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Presents current trends and potential future developments by leading researchers in immersive media production, delivery, rendering and interaction The underlying audio and video processing technology that is discussed in the book relates to areas such as 3D object extraction, audio event detection; 3D sound rendering and face detection, gesture analysis and tracking using video and depth information. The book will give an insight into current trends and developments of future media production, delivery and reproduction. Consideration of the complete production, processing and distribution chain will allow for a full picture to be presented to the reader. Production developments covered will include integrated workflows developed by researchers and industry practitioners as well as capture of ultra-high resolution panoramic video and 3D object based audio across a range of programme genres. Distribution developments will include script based format agnostic network delivery to a full range of devices from large scale public panoramic displays with wavefield synthesis and ambisonic audio reproduction to 'small screen' mobile devices. Key developments at the consumer end of the chain apply to both passive and interactive viewing modes and will incorporate user interfaces such as gesture recognition and 'second screen' devices to allow manipulation of the audio visual content.

  • Gesture control interface for immersive panoramic displays

     Alcoverro Vidal, Marcel; Suau, Xavier; Morros Rubió, Josep Ramon; López Méndez, Adolfo; Gil, Albert; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Multimedia tools and applications
    Date of publication: 2013-07-25
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we propose a gesture-based interface designed to interact with panoramic scenes. The system combines novel static gestures with a fast hand tracking method. Our proposal is to use static gestures as shortcuts to activate functionalities of the system (i.e. volume up/down, mute, pause, etc.), and hand tracking to freely explore the panoramic video. The overall system is multi-user, and incorporates a user identification module based on face recognition, which is able both to recognize returning users and to add new users online. The system exploits depth data, making it robust to challenging illumination conditions.We show through experimental results the performance of every component of the system compared to the state of the art. We also show the results of a usability study performed with several untrained users.

    Aquest article es pot consultar a: http://link.springer.com/article/10.1007%2Fs11042-013-1605-7

  • Access to the full text
    Gesture controlled interactive rendering in a panoramic scene  Open access

     Kochale, Axel; Ruiz Hidalgo, Javier; Borsum, Malte
    European Interactive TV Conference
    Presentation's date: 2013-06-24
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The demonstration described hereafter covers technical work carried out in the FascinatE project [1], related to the interactive retrieval and rendering of high-resolution panoramic scenes. The scenes have been captured by a special panoramic camera (the OMNICAM) [2] with is capturing high resolution video featuring a wide angle (180 degrees) field of view. Users can access the content by interacting based on a novel device-less and markerless gesture-based system that allows them to interact as naturally as possible, permitting the user to control the rendering of the scene by zooming, panning or framing through the panoramic scene.

    The demonstration described hereafter covers technical work carried out in the FascinatE project [1], related to the interactive retrieval and rendering of high-resolution panoramic scenes. The scenes have been captured by a special panoramic camera (the OMNICAM) [2] with is capturing high resolution video featuring a wide angle (180 degrees) field of view. Users can access the content by interacting based on a novel device-less and markerless gesture-based system that allows them to interact as naturally as possible, permitting the user to control the rendering of the scene by zooming, panning or framing through the panoramic scene

  • Gesture interaction with rich TV content in the social setting

     Zoric, Goranka; Engström, Arvid; Barkhuus, Louise; Ruiz Hidalgo, Javier; Kochale, Axel
    ACM SIGCHI Conference on Human Factors in Computing Systems
    Presentation's date: 2013-04-27
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The appearance of new immersive TV content has increased the interactive possibilities presented to the viewers. Increased interactivity is seen as a valuable feature in viewing richer television content, but new functionalities are limited by what can be done naturally and intuitively using available devices like remote controls. Therefore, new interaction techniques, such as visual gestures control systems, have appeared aiming to enhance the viewers¿ viewing experience. In this work we begin uncovering the potential and challenges of gesture interaction with ultra high definition video for people watching TV together. As a first step we have done a study with a group of people interacting with such content using a gesture-based system in the home environment

  • Fusion of colour and depth partitions for depth map coding

     Maceira Duch, Marc; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier
    International Conference on Digital Signal Processing
    Presentation's date: 2013-07-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    3D video coding includes the use of multiple color views and depth maps associated to each view. An adequate coding of depth maps should be adapted to the characteristics of depth maps: smooth regions and sharp edges. In this paper a segmentation-based technique is proposed for improving the depth map compression while preserving the main discontinuities that exploits the color-depth similarity of 3D video. An initial coarse depth map segmentation is used to locate the main discontinuities in depth. The resulting partition is improved by fusing a color partition. We assume that the color image is first encoded and available when the associated depth map is encoded, therefore the color partition can be segmented in the decoder without introducing any extra cost. A new segmentation criterion inspired by super-pixels techniques is proposed to obtain the color partition. Initial experimental results show similar compression efficiency to HEVC with a big potential for further improvements.

  • Bayesian region selection for adaptive dictionary-based Super-Resolution

     Pérez-Pellitero, Eduardo; Salvador, Jordi; Ruiz Hidalgo, Javier; Rosenhahn, Bodo
    British Machine Vision Conference
    Presentation's date: 2013-09-30
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The performance of dictionary-based super-resolution (SR) strongly depends on the contents of the training dataset. Nevertheless, many dictionary-based SR methods randomly select patches from of a larger set of training images to build their dictionaries, thus relying on patches being diverse enough. This paper describes an external-dictionary SR algorithm based on adaptively selecting an optimal subset of patches out of the training images. Each training image is divided into sub-image entities, named regions, of such size that texture consistency is preserved. For each input patch to super-resolve, the best-fitting region (with enough high-freqeuncy energy) is found through a Bayesian selection. In order to handle the high number of regions in the train- ing dataset, a local Naive Bayes Nearest Neighbor (NBNN) approach is used. Trained with this adapted subset of patches, sparse coding SR is applied to recover the high- resolution image. Experimental results demonstrate that using our adaptive algorithm produces an improvement in SR performance with respect to non-adaptive training.

  • Towards a format-agnostic approach for production, delivery and rendering of immersive media

     Niamut, O.A.; Kaiser, R.; Kienast, G.; Kochale, Axel; Spille, J.; Schreer, Oliver; Ruiz Hidalgo, Javier; Macq, Jean François; Shirley, Ben
    ACM Multimedia Systems
    Presentation's date: 2013-03-01
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The media industry is currently being pulled in the often-opposing directions of increased realism (high resolution, stereoscopic, large screen) and personalization (selection and control of content, availability on many devices). We investigate the feasibility of an end-to-end format-agnostic approach to support both these trends. In this paper, different aspects of a format- agnostic capture, production, delivery and rendering system are discussed. At the capture stage, the concept of layered scene representation is introduced, including panoramic video and 3D audio capture. At the analysis stage, a virtual director component is discussed that allows for automatic execution of cinematographic principles, using feature tracking and saliency detection. At the delivery stage, resolution-independent audiovisual transport mechanisms for both managed and unmanaged networks are treated. In the rendering stage, a rendering process that includes the manipulation of audiovisual content to match the connected display and loudspeaker properties is introduced. Different parts of the complete system are revisited demonstrating the requirements and the potential of this advanced concept.

  • Detecting end-effectors on 2.5D data using geometric deformable models: Application to human pose estimation

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Computer vision and image understanding
    Date of publication: 2013-03
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Human body analysis using depth data.

     Suau Cuadros, Xavier
    Defense's date: 2013-12-04
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    L'anàlisi del cos humà és una de les àrees més àmplies del camp de la visió per computador. Els investigadors han posat un gran esforç en el camp de l'anàlisi del cos humà, sobretot durant la darrera dècada, degut als grans avenços tecnològics, tant pel que fa a les càmeres com a la potència de càlcul. L'anàlisi del cos humà engloba varis temes com la detecció i segmentació de persones, el seguiment del moviment del cos, o el reconeixement d'accions. Tot i que els éssers humans duen a terme aquestes tasques d'una manera natural, es converteixen en un difícil problema quan s'ataca des de l'òptica de la visió per computador. Situacions adverses, com poden ser la perspectiva del punt de vista, les oclusions, les condicions d'il·luminació o la variabilitat de comportament entre persones, converteixen l'anàlisi del cos humà en una tasca complicada.En el camp de la visió per computador, l'evolució de la recerca va sovint lligada al progrés tecnològic, tant dels sensors com de la potència de càlcul dels ordinadors. Els mètodes tradicionals d'anàlisi del cos humà estan basats en càmeres de color. Això limita molt els enfocaments, ja que la informació disponible prové únicament de les dades de color. El concepte multivista va suposar salt de qualitat important. En els enfocaments multivista es tenen múltiples càmeres gravant una mateixa escena simultàniament, permetent utilitzar informació 3D gràcies a algorismes de combinació estèreo. El fet de disposar de informació 3D és un punt clau, ja que el cos humà es mou en un espai tri-dimensional. Així doncs, problemes com les oclusions es poden apaivagar si es disposa de informació 3D.L'aparició de les càmeres de profunditat comercials ha suposat un segon salt en el camp de l'anàlisi del cos humà. Mentre els mètodes multivista tradicionals requereixen un muntatge pesat i car, i una calibració precisa de totes les càmeres; les noves càmeres de profunditat ofereixen informació 3D de forma directa amb un sol sensor. Aquestes càmeres es poden instal·lar ràpidament en una gran varietat d'entorns, ampliant enormement l'espectre d'aplicacions, que era molt reduït amb enfocaments multivista. A més a més, com que les càmeres de profunditat estan basades en llum infraroja, no pateixen problemes relacionats amb canvis d'il·luminació.En aquesta tesi, ens centrem en l'estudi de la informació que ofereixen les càmeres de profunditat, i la seva aplicació al problema d'anàlisi del cos humà. Proposem noves vies per descriure les dades de profunditat mitjançant descriptors específics, capaços d'emfatitzar característiques de l'escena que seran útils de cara a una posterior anàlisi del cos humà. Aquests descriptors exploten l'estructura 3D de les dades de profunditat per superar descriptors 3D generalistes o basats en color. També estudiem el problema de detecció de persones, proposant un mètode per detectar caps robust i ràpid. Ampliem aquest mètode per obtenir un algorisme de seguiment de mans que ha estat utilitzat al llarg de la tesi. En la part final del document, ens centrem en l'anàlisi de les mans com a subàrea de l'anàlisi del cos humà. Degut a la recent aparició de les càmeres de profunditat, hi ha una manca de bases de dades públiques. Contribuïm amb una base de dades pensada per la localització de dits i el reconeixement de gestos utilitzant dades de profunditat. Aquesta base de dades és el punt de partida de dues contribucions sobre localització de dits i reconeixement de gestos basades en tècniques de classificació. En aquests mètodes, també explotem les ja mencionades propostes de descriptors per millor adaptar-nos a la naturalesa de les dades de profunditat.

  • INTAIRACT: Joint hand gesture and fingertip classification for touchless interaction

     Suau Cuadros, Xavier; Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    European Conference on Computer Vision
    Presentation's date: 2012-10-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this demo we present intAIRact, an online hand-based touchless interaction system. Interactions are based on easy-to-learn hand gestures, that combined with translations and rotations render a user friendly and highly configurable system. The main advantage with respect to existing approaches is that we are able to robustly locate and identify fingertips. Hence, we are able to employ a simple but powerful alphabet of gestures not only by determining the number of visible fingers in a gesture, but also which fingers are being observed. To achieve such a system we propose a novel method that jointly infers hand gestures and fingertip locations using a single depth image from a consumer depth camera. Our approach is based on a novel descriptor for depth data, the Oriented Radial Distribution (ORD) [1]. On the one hand, we exploit the ORD for robust classification of hand gestures by means of efficient k-NN retrieval. On the other hand, maxima of the ORD are used to perform structured inference of fingertip locations. The proposed method outperforms other state-of-the-art approaches both in gesture recognition and fingertip localization. An implementation of the ORD extraction on a GPU yields a real-time demo running at approximately 17fps on a single laptop

  • Access to the full text
    Oriented radial distribution on depth data: application to the detection of end-effectors  Open access

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    IEEE International Conference on Acoustics, Speech, and Signal Processing
    Presentation's date: 2012-03-27
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    End-effectors are considered to be the main topological extremities of a given 3D body. Even if the nature of such body is not restricted, this paper focuses on the human body case. Detection of human extremities is a key issue in the human motion capture domain, being needed to initialize and update the tracker. Therefore, the effectiveness of human motion capture systems usually depends on the reliability of the obtained end-effectors. The increasing accuracy, low cost and easy installation of depth cameras has opened the door to new strategies to overcome the body pose estimation problem. With the objective of detecting the head, hands and feet of a human body, we propose a new local feature computed from depth data, which gives an idea of its curvature and prominence. Such feature is weighted depending on recent detections, providing also a temporal dimension. Based on this feature, some end-effector candidate blobs are obtained and classified into head, hands and feet according to three probabilistic descriptors.

  • Depth map coding based on a optimal hierarchical region representation

     Maceira Duch, Marc; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon
    3DTV Conference
    Presentation's date: 2012-10-15
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Multiview color information used jointly with depth maps is a widespread technique for 3D video. Using this depth information, 3D functionalities such as free view point video can be provided by means of depth-image-based rendering techniques. In this pa- per, a new technique to encode depth maps is proposed. Based on the usually smooth structure and the sharp edges of depth map, our proposal segments the depth map into homogeneous regions of ar- bitrary shape and encodes the contents of these regions using dif- ferent texture coding strategies. An optimal lagrangian approach is applied to the hierarchical region representation provided by our segmentation technique. This approach automatically selects the best encoding strategy for each region and the optimal partition to encode the depth map. To avoid the high coding costs of coding the resulting partition, a prediction is made using the associated decoded color image

  • Variational reconstruction and restoration for video super-resolution

     Salvador, Jordi; Rivero, Daniel; Kochale, Axel; Ruiz Hidalgo, Javier
    International Conference on Pattern Recognition
    Presentation's date: 2012-11-13
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a variational framework for obtaining super-resolved video-sequences, based on the observation that reconstruction-based Super-Resolution (SR) algorithms are limited by two factors: registration exactitude and Point Spread Function (PSF) estimation accuracy. To minimize the impact of the first limiting factor, a small-scale linear in-painting algorithm is proposed to provide smooth SR video frames. To improve the second limiting factor, a fast PSF local estimation and total variation-based denoising is proposed. Experimental results reflect the improvements provided by the proposed method when compared to classic SR approaches. 2012 ICPR Org Committee.

  • Real-time head and hand tracking based on 2.5D data

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    IEEE transactions on multimedia
    Date of publication: 2012-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.

  • End user, production and hardware and network requirements. FascinatE deliverable D1.1.2

     Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon; Suau Cuadros, Xavier; Gibb, Andrew; Prins, M.J.; Zoric, G.; Engström, A.; Perry, M.; Önnevall, E.; Juhlin, O.; Hannerfors, P.; Macq, Jean François; Schreer, Oliver
    Date: 2012-02-14
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Report on interim demonstration. FascinatE deliverable D6.2.1

     Schreer, Oliver; Thomas, Graham; Thallinger, Georg; Kienast, G.; Oldfield, Rob; Ruiz Hidalgo, Javier; Macq, Jean François
    Date: 2012-07-25
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Requirements for the network interfaces and interactive systems usability. FascinatE deliverable D5.3.1

     Rondao Alface, P.; Macq, Jean François; Verzijp, Nico; Zoric, G.; Önnevall, E.; Ruiz Hidalgo, Javier; Spille, J.; Oldfield, Rob; van Brandenburg, R.
    Date: 2012-01-31
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Interim System Specification. FascinatE deliverable D1.4.2

     Thomas, Graham; Schreer, Oliver; Shirley, Ben; Oldfield, Rob; Kaiser, R.; Bailer, W; Steurer, Johannes; Kienast, G.; Poggi, A.; Macq, Jean François; Zoric, G.; Ruiz Hidalgo, Javier; Niamut, O.A.; Prins, M.J.; Borsum, M.; Spille, J.; Kochale, Axel
    Date: 2012-01-13
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Interim System Integration. FascinatE deliverable D1.5.2

     Schreer, Oliver; Feldmann, I.; Weissig, Ch.; Finn, A.; Spille, J.; Steurer, Johannes; Kochale, Axel; Ruiz Hidalgo, Javier; Thomas, Graham; Gibb, Andrew; Macq, Jean François; Rondao Alface, P.; Prins, M.J.; Mathew, S.; Niamut, O.A.; Kaiser, R.; Weiss, Thomas; Bailer, W; Oldfield, Rob; Shirley, Ben
    Date: 2012-08-27
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Dissemination and exploitation plan (including standardisation). FascinatE deliverable D7.1.3b

     Niamut, O.A.; Kienast, G.; Thallinger, Georg; Schreer, Oliver; Kochale, Axel; Ruiz Hidalgo, Javier; Thomas, Graham; Oldfield, Rob; Macq, Jean François; Masetti, Marco
    Date: 2012-02-16
    Report

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Postprint (author’s final draft)

  • AV renderer with enhanced processing with integration of scripting language. FascinatE deliverable D5.1.3

     Kochale, Axel; Borsum, M.; Spille, J.; Kropp, H.; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Gil, Albert; Macq, Jean François; Oldfield, Rob
    Date: 2012-09-03
    Report

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Postprint (author’s final draft)

  • Advanced visual rendering, gesture-based interaction and distributed delivery for immersive and interactive media services

     Niamut, O.A.; Kochale, Axel; Ruiz Hidalgo, Javier; Macq, Jean François; Kienast, G.
    International Broadcasting Convention
    Presentation's date: 2011-09-11
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Format-agnostic approach for production, delivery and rendering of immersive media

     Schreer, Oliver; Thomas, Graham; Niamut, O.A.; Macq, Jean François; Kochale, Axel; Batke, J.M.; Ruiz Hidalgo, Javier; Oldfield, Rob; Shirley, Ben; Thallinger, Georg
    Networked and Electronic Media
    Presentation's date: 2011-09-27
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The media industry is currently being pulled in the often-opposing directions of increased realism (high resolution, stereoscopic, large screen) and personalisation (selection and control of content, availability on many devices). A capture, production, delivery and rendering system capable of supporting both these trends is being developed by the EU-funded FascinatE project. In this paper, different aspects of the format agnostic approach are discussed which we believe can be a promising concept for future media production and consumption. The different parts of the complete multimedia production and delivery process are revisited demonstrating the requirements and the potential of such an advanced concept.

  • Real-time head and hand tracking based on 2.5D data

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Multimedia and Expo
    Presentation's date: 2011-07
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on 2.5D data from a range camera, which is exploited to resolve ambiguities and overlaps. Experimental results show high robustness against partial occlusions and fast movements. The estimated positions are fairly stable, allowing the extraction of accurate trajectories which may be used for gesture classification purposes.

  • SURFACE RECONSTRUCTION FOR MULTI-VIEW VIDEO  Open access

     Salvador Marcos, Jordi
    Defense's date: 2011-09-23
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This thesis introduces a methodology for obtaining an alternative representation of video sequences captured by calibrated multi-camera systems in controlled environments with known scene background. This representation consists in a 3D description of the surfaces of foreground objects, which allows for the recovering of part of the 3D information of the original scene lost in the projection process in each camera. The choice of the type of representation and the design of the reconstruction techniques are driven by three requirements that appear in smart rooms or recording studios. In these scenarios, video sequences captured by a multi-camera rig are used both for analysis applications and interactive visualization methods. The requirements are: the reconstruction method must be fast in order to be usable in interactive applications, the surface representation must provide a compression of the multi-view data redundancies and this representation must also provide all the relevant information to be used for analysis applications as well as for free-viewpoint video. Once foreground and background are segregated for each view, the reconstruction process is divided in two stages. The first one obtains a sampling of the foreground surfaces (including orientation and texture), whereas the second provides closed, continuous surfaces from the samples, through interpolation. The sampling process is interpreted as a search for 3D positions that result in feature matchings between different views. This search process can be driven by different mechanisms: an image-based approach, another one based on the deformation of a surface from frame to frame or a statistical sampling approach where samples are searched around the positions of other detected samples, which is the fastest and easiest to parallelize of the three approaches. A meshing algorithm is also presented, which allows for the interpolation of surfaces between samples. Starting by an initial triangle, which connects three points coherently oriented, an iterative expansion of the surface over the complete set of samples takes place. The proposed method presents a very accurate reconstruction and results in a correct topology. Furthermore, it is fast enough to be used interactively. The presented methodology for surface reconstruction permits obtaining a fast, compressed and complete representation of foreground elements in multi-view video, as reflected by the experimental results.

    Aquesta tesi presenta diferents tècniques per a la definiciò d’una metodologia per obtenir una representaciò alternativa de les seqüències de vídeo capturades per sistemes multi-càmera calibrats en entorns controlats, amb fons de l’escena conegut. Com el títol de la tesi suggereix, aquesta representació consisteix en una descripció tridimensional de les superfícies dels objectes de primer pla. Aquesta aproximació per la representació de les dades multi-vista permet recuperar part de la informació tridimensional de l’escena original perduda en el procés de projecció que fa cada càmera. L’elecció del tipus de representació i el disseny de les tècniques per la reconstrucció de l’escena responen a tres requeriments que apareixen en entorns controlats del tipus smart room o estudis de gravació, en què les seqüències capturades pel sistema multi-càmera són utilitzades tant per aplicacions d’anàlisi com per diferents mètodes de visualització interactius. El primer requeriment és que el mètode de reconstrucció ha de ser ràpid, per tal de poder-ho utilitzar en aplicacions interactives. El segon és que la representació de les superfícies sigui eficient, de manera que en resulti una compressió de les dades multi-vista. El tercer requeriment és que aquesta representació sigui efectiva, és a dir, que pugui ser utilitzada en aplicacions d’anàlisi, així com per visualitació. Un cop separats els continguts de primer pla i de fons de cada vista –possible en entorns controlats amb fons conegut–, l’estratègia que es segueix en el desenvolupament de la tesi és la de dividir el procés de reconstrucció en dues etapes. La primera consisteix en obtenir un mostreig de les superfícies (incloent orientació i textura). La segona etapa proporciona superfícies tancades, contínues, a partir del conjunt de mostres, mitjançant un procés d’interpolació. El resultat de la primera etapa és un conjunt de punts orientats a l’espai 3D que representen localment la posició, orientació i textura de les superfícies visibles pel conjunt de càmeres. El procés de mostreig s’interpreta com un procés de cerca de posicions 3D que resulten en correspondències de característiques de la imatge entre diferents vistes. Aquest procés de cerca pot ser conduït mitjançant diferents mecanismes, els quals es presenten a la primera part d’aquesta tesi. La primera proposta és fer servir un mètode basat en les imatges que busca mostres de superfície al llarg de la semi-recta que comença al centre de projeccions de cada càmera i passa per un determinat punt de la imatge corresponent. Aquest mètode s’adapta correctament al cas de voler explotar foto-consistència en un escenari estàtic i presenta caracterìstiques favorables per la seva utilizació en GPUs–desitjable–, però no està orientat a explotar les redundàncies temporals existentsen seqüències multi-vista ni proporciona superfícies tancades. El segon mètode efectua la cerca a partir d’una superfície inicial mostrejada que tanca l’espai on es troben els objectes a reconstruir. La cerca en direcció inversa a les normals –apuntant a l’interior– permet obtenir superfícies tancades amb un algorisme que explota la correlació temporal de l’escena per a l’evolució de reconstruccions 3D successives al llarg del temps. Un inconvenient d’aquest mètode és el conjunt d’operacions topològiques sobre la superfície inicial, que en general no són aplicables eficientment en GPUs. La tercera estratègia de mostreig està orientada a la paral·lelització –GPU– i l’explotació de correlacions temporals i espacials en la cerca de mostres de superfície. Definint un espai inicial de cerca que inclou els objectes a reconstruir, es busquen aleatòriament unes quantes mostres llavor sobre la superfície dels objectes. A continuació, es continuen buscant noves mostres de superfície al voltant de cada llavor –procés d’expansió– fins que s’aconsegueix una densitat suficient. Per tal de millorar l’eficiència de la cerca inicial de llavors, es proposa reduir l’espai de cerca, explotant d’una banda correlacions temporals en seqüències multi-vista i de l’altra aplicant multi-resolució. A continuació es procedeix amb l’expansió, que explota la correlació espacial en la distribució de les mostres de superfície. A la segona part de la tesi es presenta un algorisme de mallat que permet interpolar la superfície entre les mostres. A partir d’un triangle inicial, que connecta tres punts coherentment orientats, es procedeix a una expansió iterativa de la superfície sobre el conjunt complet de mostres. En relació amb l’estat de l’art, el mètode proposat presenta una reconstrucció molt precisa (no modifica la posició de les mostres) i resulta en una topologia correcta. A més, és prou ràpid com per ser utilitzable en aplicacions interactives, a diferència de la majoria de mètodes disponibles. Els resultats finals, aplicant ambdues etapes –mostreig i interpolació–, demostren la validesa de la proposta. Les dades experimentals mostren com la metodologia presentada permet obtenir una representació ràpida, eficient –compressió– i efectiva –completa– dels elements de primer pla de l’escena.

  • Procesado de vídeo multicámara empleando información de la escena: aplicación a eventos deportivos, interacción visual y 3DTV

     Giro Nieto, Xavier; Oliveras Verges, Albert; Gasull Llampallas, Antoni; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Sayrol Clols, Elisa; Pardas Feliu, Montserrat; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier; Vilaplana Besler, Veronica; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • First system integration. Fascinate deliverable D1.5.1

     Schreer, Oliver; Feldmann, I.; Finn, A.; Spille, J.; Kochale, Axel; Ruiz Hidalgo, Javier; Gibb, Andrew; Steurer, Johannes; Thomas, Graham; Oldfield, Rob; Shirley, Ben; Thaler, M.; Macq, Jean François; Rondao Alface, P.; Prins, M.J.; Matthew, S.; Niamut, O.A.
    Date: 2011-10-31
    Report

     Share Reference managers Reference managers Open in new window

  • Metadata and knowledge models and tools. Fascinate deliverable D3.1.2

     Bailer, W; Kaiser, R.; Poggi, A.; Macq, Jean François; Thomas, Graham; Kochale, Axel; Ruiz Hidalgo, Javier
    Date: 2011-12-02
    Report

     Share Reference managers Reference managers Open in new window

  • AV renderer with arbitrary sparse loudspeaker setups & simple interactivity. Deliverable Fascinate D5.1.2

     Kochale, Axel; Borsum, M.; Abeling, S.; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Oldfield, Rob
    Date: 2011-09-21
    Report

     Share Reference managers Reference managers Open in new window

  • Multiview depth coding based on combined color/depth segmentation

     Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Aflaki, Payman; Calderero Patino, Felipe; Marques Acosta, Fernando
    Journal of visual communication and image representation
    Date of publication: 2011-08-18
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, a new coding method for multiview depth video is presented. Considering the smooth structure and sharp edges of depth maps, a segmentation based approach is proposed. This allows further preserving the depth contours thus introducing fewer artifacts in the depth perception of the video. To reduce the cost associated with partition coding, an approximation of the depth partition is built using the decoded color view segmentation. This approximation is refined by sending some complementary information about the relevant differences between color and depth partitions. For coding the depth content of each region, a decomposition into orthogonal basis is used in this paper although similar decompositions may be also employed. Experimental results show that the proposed segmentation based depth coding method outperforms H.264/AVC and H.264/MVC by more than 2 dB at similar bitrates.

  • Surface reconstruction by restricted and oriented propagation

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Image Processing
    Presentation's date: 2010-09-27
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Generalized lifting for sparse image representation and coding

     Rolon Garrido, Julio Cesar
    Defense's date: 2010-01-25
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Adquisición multicámara para Free Viewpoint Video (MC4FVV)

     Pardas Feliu, Montserrat; Giro Nieto, Xavier; Vilaplana Besler, Veronica; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Gasull Llampallas, Antoni; Oliveras Verges, Albert; Sayrol Clols, Elisa; Casas Pla, Josep Ramon
    Participation in a competitive project

     Share

  • Best student paper award ICIP 2010

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    Award or recognition

     Share

  • FascinatE D1.1.1 End user, production and hardware and networking requirements

     Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon; Suau Cuadros, Xavier; Gibb, Andrew; Niamut, O.A.; Prins, M.J.; Zoric, G.; Engström, A.; Perry, M.; Önnevall, E.; Juhlin, O.; Macq, Jean François
    Date: 2010-09-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • FascinatE D1.4.1 Initial system specification

     Ruiz Hidalgo, Javier
    Date: 2010-11-01
    Report

     Share Reference managers Reference managers Open in new window

  • FascinatE D5.1.1 AV renderer specification and basic characterisation of audience interaction

     Borsum, M.; Spille, J.; Kochale, Axel; Önnevall, E.; Zoric, G.; Ruiz Hidalgo, Javier
    Date: 2010-09-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    FascinatE Newsletter 1  Open access

     Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Thallinger, Georg; Shirley, Ben
    Date: 2010-10-01
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This FascinatE newsletter explains how gesture recognition will be used in the FascinatE system, how our first test shoot went at a Premier League football match, and explains about up and coming events.

  • Format-Agnostic SCript-based INterAcTive Experience

     Casas Pla, Josep Ramon; Morros Rubió, Josep Ramon; Marques Acosta, Fernando; Pardas Feliu, Montserrat; Ruiz Hidalgo, Javier
    Participation in a competitive project

     Share

  • Access to the full text
    FascinatE D3.1.1 Survey of Metadata and Knowledge for Automated Scripting  Open access

     Bailer, W; Kaiser, R.; Engstrom, A.; Ruiz Hidalgo, Javier; Kochale, Axel; Macq, Jean François; Rondao Alface, P.; Verzijp, Nico; Masetti, Marco; Poggi, A.; Niamut, O.A.; Shirley, Ben; Oldfield, Rob; Schreer, Oliver; Thomas, Graham
    Date: 2010-11-30
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This document defines the various types of metadata in the FascinatE system and discusses representation requirements and candidate formats. The document considers various types of metadata describing capture, production, context, content, scripts, network, terminals and users. Representation of essence and essence-like metadata (e.g. depth maps) are out of scope of this document.

  • HESPERIA Homeland security: tecnologías para la seguridad integral en espacios públicos e infraestructuras CENIT-2005 Entregable 3.1.1 Revisión del estado del arte 2009

     Ruiz Hidalgo, Javier; Sainz, Félix; Albiol Colomer, Antonio; Albiol Colomer, Alberto; Morros Rubió, Josep Ramon
    Date: 2010-01-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • HESPERIA Homeland security: tecnologías para la seguridad integral en espacios públicos e infraestructuras CENIT-2005 Paquete de Trabajo 5, Actividad 5.2 E.5.2.1 Descripción del plan de pruebas

     Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Albiol Colomer, Antonio; Albiol Colomer, Alberto; Silla Martínez, María Julia; Sainz, Félix
    Date: 2010-02-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Integration of audiovisual sensors and technologies in a smart room  Open access

     Neumann, J; Casas Pla, Josep Ramon; Macho, D; Ruiz Hidalgo, Javier
    Personal and ubiquitous computing
    Date of publication: 2009-01
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    At the Technical University of Catalonia (UPC), a smart room has been equipped with 85 microphones and 8 cameras. This paper describes the setup of the sensors, gives an overview of the underlying hardware and software infrastructure and indicates possibilities for highand low-level multi-modal interaction. An example of usage of the information collected from the distributed sensor network is explained in detail: the system supports a group of students that have to solve a lab assignment related problem.

  • Access to the full text
    Multi-resolution illumination compensation for foreground extraction  Open access  awarded activity

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Image Processing
    Presentation's date: 2009-11-10
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Illumination changes may lead to false foreground (FG) segmentation and tracking results. Most of the existing FG extraction algorithms obtain a background (BG) estimation from temporal statistical parameters. Such algorithms consider a quasi-static BG which does not change but slowly. Therefore, fast illumination changes are not taken into account by the BG estimator and they are considered as FG. The aim of the proposed algorithm is to reduce illumination effects in video sequences in order to improve foreground segmentation performances.

  • Access to the full text
    Comparison of MPEG-7 descriptors for long term selection of reference frames  Open access

     Ruiz Hidalgo, Javier; Salembier Clairon, Philippe Jean
    IEEE International Conference on Acoustics, Speech and Signal Processing
    Presentation's date: 2009-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    During the last years, the amount of multimedia content has greatly increased. This has multiplied the need of efficient compression of the content but also the ability to search, retrieve, browse, or filter it. Generally, video compression and indexing have been investigated separately. However, as the amount of multimedia content grows, it will be very interesting to study representations that, at the same time, provide good compression and indexing functionalities. Moreover, even if the indexing metadata is created for functionalities such as search, retrieval, browsing, etc., it can also be employed to increase the efficiency of current video codecs. Here, we use it to improve the long term prediction step of the H.264/AVC video codec. This paper focuses on the comparison between four different MPEG-7 descriptors when used in the proposed scheme.

  • How are digital images compressed in the web?

     Marques Acosta, Fernando; Menezes, M.; Ruiz Hidalgo, Javier
    Date of publication: 2009
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • How are digital TV programs compressed to allow broadcasting?

     Marques Acosta, Fernando; Menezes, M.; Ruiz Hidalgo, Javier
    Date of publication: 2009
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window