Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 88 results
  • Media production, delivery and interaction for platform independent systems

    DOI: 10.1002/9781118706350
    Date of publication: 2014-01-01
    Book

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Presents current trends and potential future developments by leading researchers in immersive media production, delivery, rendering and interaction The underlying audio and video processing technology that is discussed in the book relates to areas such as 3D object extraction, audio event detection; 3D sound rendering and face detection, gesture analysis and tracking using video and depth information. The book will give an insight into current trends and developments of future media production, delivery and reproduction. Consideration of the complete production, processing and distribution chain will allow for a full picture to be presented to the reader. Production developments covered will include integrated workflows developed by researchers and industry practitioners as well as capture of ultra-high resolution panoramic video and 3D object based audio across a range of programme genres. Distribution developments will include script based format agnostic network delivery to a full range of devices from large scale public panoramic displays with wavefield synthesis and ambisonic audio reproduction to 'small screen' mobile devices. Key developments at the consumer end of the chain apply to both passive and interactive viewing modes and will incorporate user interfaces such as gesture recognition and 'second screen' devices to allow manipulation of the audio visual content.

  • Interactive rendering

     Ruiz Hidalgo, Javier; Borsum, Malte; Kochale, Axel; Zoric, Goranka
    DOI: 10.1002/9781118706350
    Date of publication: 2014-01-01
    Book chapter

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Presents current trends and potential future developments by leading researchers in immersive media production, delivery, rendering and interaction The underlying audio and video processing technology that is discussed in the book relates to areas such as 3D object extraction, audio event detection; 3D sound rendering and face detection, gesture analysis and tracking using video and depth information. The book will give an insight into current trends and developments of future media production, delivery and reproduction. Consideration of the complete production, processing and distribution chain will allow for a full picture to be presented to the reader. Production developments covered will include integrated workflows developed by researchers and industry practitioners as well as capture of ultra-high resolution panoramic video and 3D object based audio across a range of programme genres. Distribution developments will include script based format agnostic network delivery to a full range of devices from large scale public panoramic displays with wavefield synthesis and ambisonic audio reproduction to 'small screen' mobile devices. Key developments at the consumer end of the chain apply to both passive and interactive viewing modes and will incorporate user interfaces such as gesture recognition and 'second screen' devices to allow manipulation of the audio visual content.

  • Grup de processament d'imatge i video

     Gasull Llampallas, Antoni; Giro Nieto, Xavier; Marques Acosta, Fernando; Morros Rubió, Josep Ramon; Oliveras Verges, Albert; Pardas Feliu, Montserrat; Ruiz Hidalgo, Javier; Salembier Clairon, Philippe Jean; Sayrol Clols, Elisa; Vilaplana Besler, Veronica; Casas Pla, Josep Ramon
    Competitive project

     Share

  • Procesado de información heterogénea y señales en grafos para Big Data:aplicación en cribado de alto rendimiento,teledetección,multimedia y HCI

     Gasull Llampallas, Antoni; Ruiz Hidalgo, Javier; Giro Nieto, Xavier; Marques Acosta, Fernando; Morros Rubió, Josep Ramon; Oliveras Verges, Albert; Salembier Clairon, Philippe Jean; Sayrol Clols, Elisa; Vilaplana Besler, Veronica; Casas Pla, Josep Ramon; Pardas Feliu, Montserrat
    Competitive project

     Share

  • Access to the full text
    Gesture control interface for immersive panoramic displays  Open access

     Alcoverro Vidal, Marcel; Suau Cuadros, Xavier; Morros Rubió, Josep Ramon; López Méndez, Adolfo; Gil Moreno, Albert; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Multimedia tools and applications
    Vol. 73, num. 1, p. 491-517
    DOI: 10.1007/s11042-013-1605-7
    Date of publication: 2014-11-01
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we propose a gesture-based interface designed to interact with panoramic scenes. The system combines novel static gestures with a fast hand tracking method. Our proposal is to use static gestures as shortcuts to activate functionalities of the system (i.e. volume up/down, mute, pause, etc.), and hand tracking to freely explore the panoramic video. The overall system is multi-user, and incorporates a user identification module based on face recognition, which is able both to recognize returning users and to add new users online. The system exploits depth data, making it robust to challenging illumination conditions. We show through experimental results the performance of every component of the system compared to the state of the art. We also show the results of a usability study performed with several untrained users.

    In this paper, we propose a gesture-based interface designed to interact with panoramic scenes. The system combines novel static gestures with a fast hand tracking method. Our proposal is to use static gestures as shortcuts to activate functionalities of the system (i.e. volume up/down, mute, pause, etc.), and hand tracking to freely explore the panoramic video. The overall system is multi-user, and incorporates a user identification module based on face recognition, which is able both to recognize returning users and to add new users online. The system exploits depth data, making it robust to challenging illumination conditions.We show through experimental results the performance of every component of the system compared to the state of the art. We also show the results of a usability study performed with several untrained users.

    Aquest article es pot consultar a: http://link.springer.com/article/10.1007%2Fs11042-013-1605-7

  • Real-time fingertip localization conditioned on hand gesture classification

     Suau Cuadros, Xavier; Alcoverro Vidal, Marcel; López-Méndez, Adolfo; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Image and vision computing
    Vol. 32, num. 8, p. 522-532
    DOI: 10.1016/j.imavis.2014.04.015
    Date of publication: 2014-05-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A method to obtain accurate hand gesture classification and fingertip localization from depth images is proposed. The Oriented Radial Distribution feature is utilized, exploiting its ability to globally describe hand poses, but also to locally detect fingertip positions. Hence, hand gesture and fingertip locations are characterized with a single feature calculation. We propose to divide the difficult problem of locating fingertips into two more tractable problems, by taking advantage of hand gesture as an auxiliary variable. Along with the method we present the ColorTip dataset, a dataset for hand gesture recognition and fingertip classification using depth data. ColorTip contains sequences where actors wear a glove with with colored fingertips, allowing automatic annotation. The proposed method is evaluated against recent works in several datasets, achieving promising results in both gesture classification and fingertip localization.

  • New interaction modes for rich panoramic live video experiences

     Barkhuus, Louise; Zoric, Goranka; Engström, Arvid; Ruiz Hidalgo, Javier; Verzijp, Nico
    Behaviour & information technology
    Vol. 33, num. 8, p. 859-869
    DOI: 10.1080/0144929X.2014.914975
    Date of publication: 2014-07-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The possibilities of panoramic video are based on the capabilities of high-resolution digital video streams and higher band- width¿s opportunities to broadcast, stream and transfer large content across platforms. With these opportunities also come challenges such as how to focus on sub-parts of the video stream and interact with the content shown on a large screen. In this paper, we present studies of two different interaction modes with a large-scale panoramic video for live experiences; we focus on interactional challenges and explore if it is (1) possible to develop new interactional methods/ways of approaching this type of high-resolution content and (2) feasible for users to interact with the content in these new ways. We developed prototypes for two different interaction modes: an individual system on a mobile device, either a tablet or a mobile phone, for interacting with the content on the same and a non-touch gesture-based system for the home or small group interaction. We present pilot studies where we explore the possibilities and challenges with these two interaction modes for panoramic content.

  • Human body analysis using depth data.  Open access

     Suau Cuadros, Xavier
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    L'anàlisi del cos humà és una de les àrees més àmplies del camp de la visió per computador. Els investigadors han posat un gran esforç en el camp de l'anàlisi del cos humà, sobretot durant la darrera dècada, degut als grans avenços tecnològics, tant pel que fa a les càmeres com a la potència de càlcul. L'anàlisi del cos humà engloba varis temes com la detecció i segmentació de persones, el seguiment del moviment del cos, o el reconeixement d'accions. Tot i que els éssers humans duen a terme aquestes tasques d'una manera natural, es converteixen en un difícil problema quan s'ataca des de l'òptica de la visió per computador. Situacions adverses, com poden ser la perspectiva del punt de vista, les oclusions, les condicions d'il·luminació o la variabilitat de comportament entre persones, converteixen l'anàlisi del cos humà en una tasca complicada.En el camp de la visió per computador, l'evolució de la recerca va sovint lligada al progrés tecnològic, tant dels sensors com de la potència de càlcul dels ordinadors. Els mètodes tradicionals d'anàlisi del cos humà estan basats en càmeres de color. Això limita molt els enfocaments, ja que la informació disponible prové únicament de les dades de color. El concepte multivista va suposar salt de qualitat important. En els enfocaments multivista es tenen múltiples càmeres gravant una mateixa escena simultàniament, permetent utilitzar informació 3D gràcies a algorismes de combinació estèreo. El fet de disposar de informació 3D és un punt clau, ja que el cos humà es mou en un espai tri-dimensional. Així doncs, problemes com les oclusions es poden apaivagar si es disposa de informació 3D.L'aparició de les càmeres de profunditat comercials ha suposat un segon salt en el camp de l'anàlisi del cos humà. Mentre els mètodes multivista tradicionals requereixen un muntatge pesat i car, i una calibració precisa de totes les càmeres; les noves càmeres de profunditat ofereixen informació 3D de forma directa amb un sol sensor. Aquestes càmeres es poden instal·lar ràpidament en una gran varietat d'entorns, ampliant enormement l'espectre d'aplicacions, que era molt reduït amb enfocaments multivista. A més a més, com que les càmeres de profunditat estan basades en llum infraroja, no pateixen problemes relacionats amb canvis d'il·luminació.En aquesta tesi, ens centrem en l'estudi de la informació que ofereixen les càmeres de profunditat, i la seva aplicació al problema d'anàlisi del cos humà. Proposem noves vies per descriure les dades de profunditat mitjançant descriptors específics, capaços d'emfatitzar característiques de l'escena que seran útils de cara a una posterior anàlisi del cos humà. Aquests descriptors exploten l'estructura 3D de les dades de profunditat per superar descriptors 3D generalistes o basats en color. També estudiem el problema de detecció de persones, proposant un mètode per detectar caps robust i ràpid. Ampliem aquest mètode per obtenir un algorisme de seguiment de mans que ha estat utilitzat al llarg de la tesi. En la part final del document, ens centrem en l'anàlisi de les mans com a subàrea de l'anàlisi del cos humà. Degut a la recent aparició de les càmeres de profunditat, hi ha una manca de bases de dades públiques. Contribuïm amb una base de dades pensada per la localització de dits i el reconeixement de gestos utilitzant dades de profunditat. Aquesta base de dades és el punt de partida de dues contribucions sobre localització de dits i reconeixement de gestos basades en tècniques de classificació. En aquests mètodes, també explotem les ja mencionades propostes de descriptors per millor adaptar-nos a la naturalesa de les dades de profunditat.

    Human body analysis is one of the broadest areas within the computer vision field. Researchers have put a strong effort in the human body analysis area, specially over the last decade, due to the technological improvements in both video cameras and processing power. Human body analysis covers topics such as person detection and segmentation, human motion tracking or action and behavior recognition. Even if human beings perform all these tasks naturally, they build-up a challenging problem from a computer vision point of view. Adverse situations such as viewing perspective, clutter and occlusions, lighting conditions or variability of behavior amongst persons may turn human body analysis into an arduous task. In the computer vision field, the evolution of research works is usually tightly related to the technological progress of camera sensors and computer processing power. Traditional human body analysis methods are based on color cameras. Thus, the information is extracted from the raw color data, strongly limiting the proposals. An interesting quality leap was achieved by introducing the multiview concept. That is to say, having multiple color cameras recording a single scene at the same time. With multiview approaches, 3D information is available by means of stereo matching algorithms. The fact of having 3D information is a key aspect in human motion analysis, since the human body moves in a three-dimensional space. Thus, problems such as occlusion and clutter may be overcome with 3D information. The appearance of commercial depth cameras has supposed a second leap in the human body analysis field. While traditional multiview approaches required a cumbersome and expensive setup, as well as a fine camera calibration; novel depth cameras directly provide 3D information with a single camera sensor. Furthermore, depth cameras may be rapidly installed in a wide range of situations, enlarging the range of applications with respect to multiview approaches. Moreover, since depth cameras are based on infra-red light, they do not suffer from illumination variations. In this thesis, we focus on the study of depth data applied to the human body analysis problem. We propose novel ways of describing depth data through specific descriptors, so that they emphasize helpful characteristics of the scene for further body analysis. These descriptors exploit the special 3D structure of depth data to outperform generalist 3D descriptors or color based ones. We also study the problem of person detection, proposing a highly robust and fast method to detect heads. Such method is extended to a hand tracker, which is used throughout the thesis as a helpful tool to enable further research. In the remainder of this dissertation, we focus on the hand analysis problem as a subarea of human body analysis. Given the recent appearance of depth cameras, there is a lack of public datasets. We contribute with a dataset for hand gesture recognition and fingertip localization using depth data. This dataset acts as a starting point of two proposals for hand gesture recognition and fingertip localization based on classification techniques. In these methods, we also exploit the above mentioned descriptor proposals to finely adapt to the nature of depth data.%, and enhance the results in front of traditional color-based methods.

    L’anàlisi del cos humà és una de les àrees més àmplies del camp de la visió per computador. Els investigadors han posat un gran esforç en el camp de l’anàlisi del cos humà, sobretot durant la darrera dècada, degut als grans avenços tecnològics, tant pel que fa a les càmeres com a la potencia de càlcul. L’anàlisi del cos humà engloba varis temes com la detecció i segmentació de persones, el seguiment del moviment del cos, o el reconeixement d'accions. Tot i que els essers humans duen a terme aquestes tasques d'una manera natural, es converteixen en un difícil problema quan s'ataca des de l’òptica de la visió per computador. Situacions adverses, com poden ser la perspectiva del punt de vista, les oclusions, les condicions d’il•luminació o la variabilitat de comportament entre persones, converteixen l’anàlisi del cos humà en una tasca complicada. En el camp de la visió per computador, l’evolució de la recerca va sovint lligada al progrés tecnològic, tant dels sensors com de la potencia de càlcul dels ordinadors. Els mètodes tradicionals d’anàlisi del cos humà estan basats en càmeres de color. Això limita molt els enfocaments, ja que la informació disponible prové únicament de les dades de color. El concepte multivista va suposar salt de qualitat important. En els enfocaments multivista es tenen múltiples càmeres gravant una mateixa escena simultàniament, permetent utilitzar informació 3D gràcies a algorismes de combinació estèreo. El fet de disposar d’informació 3D es un punt clau, ja que el cos humà es mou en un espai tri-dimensional. Això doncs, problemes com les oclusions es poden apaivagar si es disposa de informació 3D. L’aparició de les càmeres de profunditat comercials ha suposat un segon salt en el camp de l’anàlisi del cos humà. Mentre els mètodes multivista tradicionals requereixen un muntatge pesat i car, i una celebració precisa de totes les càmeres; les noves càmeres de profunditat ofereixen informació 3D de forma directa amb un sol sensor. Aquestes càmeres es poden instal•lar ràpidament en una gran varietat d'entorns, ampliant enormement l'espectre d'aplicacions, que era molt reduït amb enfocaments multivista. A més a més, com que les càmeres de profunditat estan basades en llum infraroja, no pateixen problemes relacionats amb canvis d’il•luminació. En aquesta tesi, ens centrem en l'estudi de la informació que ofereixen les càmeres de profunditat, i la seva aplicació al problema d’anàlisi del cos humà. Proposem noves vies per descriure les dades de profunditat mitjançant descriptors específics, capaços d'emfatitzar característiques de l'escena que seran útils de cara a una posterior anàlisi del cos humà. Aquests descriptors exploten l'estructura 3D de les dades de profunditat per superar descriptors 3D generalistes o basats en color. També estudiem el problema de detecció de persones, proposant un mètode per detectar caps robust i ràpid. Ampliem aquest mètode per obtenir un algorisme de seguiment de mans que ha estat utilitzat al llarg de la tesi. En la part final del document, ens centrem en l’anàlisi de les mans com a subàrea de l’anàlisi del cos humà. Degut a la recent aparició de les càmeres de profunditat, hi ha una manca de bases de dades públiques. Contribuïm amb una base de dades pensada per la localització de dits i el reconeixement de gestos utilitzant dades de profunditat. Aquesta base de dades és el punt de partida de dues contribucions sobre localització de dits i reconeixement de gestos basades en tècniques de classificació. En aquests mètodes, també explotem les ja mencionades propostes de descriptors per millor adaptar-nos a la naturalesa de les dades de profunditat.

  • End user, production and aardware and network requirements. FascinatE deliverable D1.1.3

     Niamut, Omar; Thomas, Graham; Macq, Jean François; Kienast, Gert; Engstrom, Arvid; Zoric, Goranka; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier
    Date: 2013-07-25
    Report

     Share Reference managers Reference managers Open in new window

  • Real-time AV renderer with support for WFS and full interactivity. FascinatE deliverable D5.1.4

     Kochale, Axel; Borsum, Malte; Spille, Jens; Kropp, Holger; Alcoverro Vidal, Marcel; Gil Moreno, Albert; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Macq, Jean François; Verzijp, Nico; Oldfield, Rob; Zoric, Goranka
    Date: 2013-07-01
    Report

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Report on final demonstration. FascinatE deliverable D6.3.1  Open access

     Thomas, Graham; Schreer, Oliver; Thallinger, Georg; Kienast, Gert; Oldfield, Rob; Ruiz Hidalgo, Javier; Macq, Jean François; Prins, M.J.
    Date: 2013-06-21
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The objective of WP6 in the FascinatE project is to organise a series of convincing demonstrations that raise awareness of the project in the broadcast and m edia industry, as well as providing focal points for the technical work of the project. This document reports on the third and final public demonstration of FascinatE technology, held at MediaCity UK, at the premises of the University of Salford. The centrepiece of the demonstration was the use of the end-to-end FascinatE chain being used to capture, deliver and display a live music and dance performance staged in the University¿s Digital Performance Lab. The performance ran three times during the day, with each show being preced ed by a 30-minute presentation to introduce the project and explain the various aspects of the technol ogy that were about to be demonstrated. This was accompanied by a set of stand-alone demonstrations that ran throughout the day, giving more in-depth insights into various results from the project. The event also resulted in several press publications, which are also listed in this deliverable. During the demonstrations, audio and video data were captured to support the evaluation tasks during the remainder of the project, and for use for research beyond the end of the project.

    The objective of WP6 in the FascinatE project is to organise a series of convincing demonstrations that raise awareness of the project in the broadcast and media industry, as well as providing focal points for the technical work of the project. This document reports on the third and final public demonstration of FascinatE technology, held at MediaCity UK, at the premises of the University of Salford. The centrepiece of the demonstration was the use of the end-to-end FascinatE chain being used to capture, deliver and display a live music and dance performance staged in the University’s Digital Performance Lab. The performance ran three times during the day, with each show being preced ed by a 30-minute presentation to introduce the project and explain the various aspects of the technol ogy that were about to be demonstrated. This was accompanied by a set of stand-alone demonstrations that ran throughout the day, giving more in-depth insights into various results from the project. The event also resulted in several press publications, which are also listed in this deliverable. During the demonstrations, audio and video data were captured to support the evaluation tasks during the remainder of the project, and for use for research beyond the end of the project.

  • Dissemination and exploitation report (including Standardisation). FascinatE deliverable D7.1.3c

     Kienast, Gert; Oldfield, Rob; Batke, J.M.; Kochale, Axel; Riemann, Uwe; Spille, Jens; Ruiz Hidalgo, Javier; Steurer, Johannes; Masetti, Marco; Schreer, Oliver; Thomas, E.; Niamut, Omar
    Date: 2013-08-23
    Report

     Share Reference managers Reference managers Open in new window

  • Report on system performance. FascinatE deliverable D1.5.4

     Schreer, Oliver; Feldmann, Ingo; Weissig, Christian; Finn, Arne; Steurer, Johannes; Thomas, Graham; Gibb, Andrew; Spille, Jens; Kochale, Axel; Kropp, Holger; Borsum, Malte; Ruiz Hidalgo, Javier; Macq, Jean François; Verzijp, Nico; Oldfield, Rob; Shirley, Ben; Prins, M.J.; Niamut, Omar; Matthew, S.; Kienast, Gert; Kaiser, Rene; Weiss, Wolfgang; Bailer, Werner
    Date: 2013-07-30
    Report

     Share Reference managers Reference managers Open in new window

  • Detecting end-effectors on 2.5D data using geometric deformable models: application to human pose estimation

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    Computer vision and image understanding
    Vol. 117, num. 3, p. 281-288
    DOI: 10.1016/j.cviu.2012.11.006
    Date of publication: 2013-03
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    End-effectors are usually related to the location of limbs, and their reliable detection enables robust body tracking as well as accurate pose estimation. Recent innovation in depth cameras has re-stated the pose estimation problem. We focus on the information provided by these sensors, for which we borrow the name 2.5D data from the Graphics community. In this paper we propose a human pose estimation algorithm based on topological propagation. Geometric Deformable Models are used to carry out such propagation, implemented according to the Narrow Band Level Set approach. A variant of the latter method is proposed, including a density restriction which helps preserving the topological properties of the object under analysis. Principal end-effectors are extracted from a directed graph weighted with geodesic distances, also providing a skeletal-like structure describing human pose. An evaluation against reference methods is performed with promising results. The proposed solution allows a frame-wise end-effector detection, with no temporal tracking involved, which may be generalized to the tracking of other objects beyond human body.

  • Access to the full text
    Gesture controlled interactive rendering in a panoramic scene  Open access

     Kochale, Axel; Ruiz Hidalgo, Javier; Borsum, Malte
    European Interactive TV Conference
    p. 188-189
    Presentation's date: 2013-06-24
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The demonstration described hereafter covers technical work carried out in the FascinatE project [1], related to the interactive retrieval and rendering of high-resolution panoramic scenes. The scenes have been captured by a special panoramic camera (the OMNICAM) [2] with is capturing high resolution video featuring a wide angle (180 degrees) field of view. Users can access the content by interacting based on a novel device-less and markerless gesture-based system that allows them to interact as naturally as possible, permitting the user to control the rendering of the scene by zooming, panning or framing through the panoramic scene.

    The demonstration described hereafter covers technical work carried out in the FascinatE project [1], related to the interactive retrieval and rendering of high-resolution panoramic scenes. The scenes have been captured by a special panoramic camera (the OMNICAM) [2] with is capturing high resolution video featuring a wide angle (180 degrees) field of view. Users can access the content by interacting based on a novel device-less and markerless gesture-based system that allows them to interact as naturally as possible, permitting the user to control the rendering of the scene by zooming, panning or framing through the panoramic scene

  • Gesture interaction with rich TV content in the social setting

     Zoric, Goranka; Engström, Arvid; Barkhuus, Louise; Ruiz Hidalgo, Javier; Kochale, Axel
    ACM SIGCHI Conference on Human Factors in Computing Systems
    p. 1-4
    Presentation's date: 2013-04-27
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The appearance of new immersive TV content has increased the interactive possibilities presented to the viewers. Increased interactivity is seen as a valuable feature in viewing richer television content, but new functionalities are limited by what can be done naturally and intuitively using available devices like remote controls. Therefore, new interaction techniques, such as visual gestures control systems, have appeared aiming to enhance the viewers¿ viewing experience. In this work we begin uncovering the potential and challenges of gesture interaction with ultra high definition video for people watching TV together. As a first step we have done a study with a group of people interacting with such content using a gesture-based system in the home environment

  • Fusion of colour and depth partitions for depth map coding

     Maceira Duch, Marc; Morros Rubió, Josep Ramon; Ruiz Hidalgo, Javier
    International Conference on Digital Signal Processing
    p. 1-7
    DOI: 10.1109/ICDSP.2013.6622781
    Presentation's date: 2013-07-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    3D video coding includes the use of multiple color views and depth maps associated to each view. An adequate coding of depth maps should be adapted to the characteristics of depth maps: smooth regions and sharp edges. In this paper a segmentation-based technique is proposed for improving the depth map compression while preserving the main discontinuities that exploits the color-depth similarity of 3D video. An initial coarse depth map segmentation is used to locate the main discontinuities in depth. The resulting partition is improved by fusing a color partition. We assume that the color image is first encoded and available when the associated depth map is encoded, therefore the color partition can be segmented in the decoder without introducing any extra cost. A new segmentation criterion inspired by super-pixels techniques is proposed to obtain the color partition. Initial experimental results show similar compression efficiency to HEVC with a big potential for further improvements.

  • Bayesian region selection for adaptive dictionary-based Super-Resolution

     Pérez-Pellitero, Eduardo; Salvador, Jordi; Ruiz Hidalgo, Javier; Rosenhahn, Bodo
    British Machine Vision Conference
    p. 1-11
    Presentation's date: 2013-09-30
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The performance of dictionary-based super-resolution (SR) strongly depends on the contents of the training dataset. Nevertheless, many dictionary-based SR methods randomly select patches from of a larger set of training images to build their dictionaries, thus relying on patches being diverse enough. This paper describes an external-dictionary SR algorithm based on adaptively selecting an optimal subset of patches out of the training images. Each training image is divided into sub-image entities, named regions, of such size that texture consistency is preserved. For each input patch to super-resolve, the best-fitting region (with enough high-freqeuncy energy) is found through a Bayesian selection. In order to handle the high number of regions in the train- ing dataset, a local Naive Bayes Nearest Neighbor (NBNN) approach is used. Trained with this adapted subset of patches, sparse coding SR is applied to recover the high- resolution image. Experimental results demonstrate that using our adaptive algorithm produces an improvement in SR performance with respect to non-adaptive training.

  • Towards a format-agnostic approach for production, delivery and rendering of immersive media

     Niamut, Omar A.; Kaiser, Rene; Kienast, Gert; Kochale, Axel; Spille, Jens; Schreer, Oliver; Ruiz Hidalgo, Javier; Macq, Jean François; Shirley, Ben
    ACM Multimedia Systems
    p. 249-260
    DOI: 10.1145/2483977.2484007
    Presentation's date: 2013-03-01
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The media industry is currently being pulled in the often-opposing directions of increased realism (high resolution, stereoscopic, large screen) and personalization (selection and control of content, availability on many devices). We investigate the feasibility of an end-to-end format-agnostic approach to support both these trends. In this paper, different aspects of a format- agnostic capture, production, delivery and rendering system are discussed. At the capture stage, the concept of layered scene representation is introduced, including panoramic video and 3D audio capture. At the analysis stage, a virtual director component is discussed that allows for automatic execution of cinematographic principles, using feature tracking and saliency detection. At the delivery stage, resolution-independent audiovisual transport mechanisms for both managed and unmanaged networks are treated. In the rendering stage, a rendering process that includes the manipulation of audiovisual content to match the connected display and loudspeaker properties is introduced. Different parts of the complete system are revisited demonstrating the requirements and the potential of this advanced concept.

  • Interim System Specification. FascinatE deliverable D1.4.2

     Thomas, Graham; Schreer, Oliver; Shirley, Ben; Oldfield, Rob; Kaiser, Rene; Bailer, W; Steurer, Johannes; Kienast, Gert; Poggi, A.; Macq, Jean François; Zoric, G.; Ruiz Hidalgo, Javier; Niamut, Omar A.; Prins, M.J.; Borsum, M.; Spille, Jens; Kochale, Axel
    Date: 2012-01-13
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Interim System Integration. FascinatE deliverable D1.5.2

     Schreer, Oliver; Feldmann, I.; Weissig, Ch.; Finn, A.; Spille, Jens; Steurer, Johannes; Kochale, Axel; Ruiz Hidalgo, Javier; Thomas, Graham; Gibb, Andrew; Macq, Jean François; Rondao Alface, P.; Prins, M.J.; Mathew, S.; Niamut, Omar A.; Kaiser, Rene; Weiss, Thomas; Bailer, W; Oldfield, Rob; Shirley, Ben
    Date: 2012-08-27
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Dissemination and exploitation plan (including standardisation). FascinatE deliverable D7.1.3b

     Niamut, Omar A.; Kienast, Gert; Thallinger, Georg; Schreer, Oliver; Kochale, Axel; Ruiz Hidalgo, Javier; Thomas, Graham; Oldfield, Rob; Macq, Jean François; Masetti, Marco
    Date: 2012-02-16
    Report

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Postprint (author’s final draft)

  • AV renderer with enhanced processing with integration of scripting language. FascinatE deliverable D5.1.3

     Kochale, Axel; Borsum, M.; Spille, Jens; Kropp, H.; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Gil Moreno, Albert; Macq, Jean François; Oldfield, Rob
    Date: 2012-09-03
    Report

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Postprint (author’s final draft)

  • End user, production and hardware and network requirements. FascinatE deliverable D1.1.2

     Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon; Suau Cuadros, Xavier; Gibb, Andrew; Prins, M.J.; Zoric, G.; Engström, A.; Perry, M.; Önnevall, E.; Juhlin, O.; Hannerfors, P.; Macq, Jean François; Schreer, Oliver
    Date: 2012-02-14
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Report on interim demonstration. FascinatE deliverable D6.2.1

     Schreer, Oliver; Thomas, Graham; Thallinger, Georg; Kienast, Gert; Oldfield, Rob; Ruiz Hidalgo, Javier; Macq, Jean François
    Date: 2012-07-25
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Requirements for the network interfaces and interactive systems usability. FascinatE deliverable D5.3.1

     Rondao Alface, P.; Macq, Jean François; Verzijp, Nico; Zoric, G.; Önnevall, E.; Ruiz Hidalgo, Javier; Spille, Jens; Oldfield, Rob; van Brandenburg, R.
    Date: 2012-01-31
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Real-time head and hand tracking based on 2.5D data

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    IEEE transactions on multimedia
    Vol. 14, num. 3, p. 575-585
    DOI: 10.1109/TMM.2012.2189853
    Date of publication: 2012-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.

  • Depth map coding based on a optimal hierarchical region representation

     Maceira Duch, Marc; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon
    3DTV Conference
    p. 1-4
    DOI: 10.1109/3DTV.2012.6365481
    Presentation's date: 2012-10-15
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Multiview color information used jointly with depth maps is a widespread technique for 3D video. Using this depth information, 3D functionalities such as free view point video can be provided by means of depth-image-based rendering techniques. In this pa- per, a new technique to encode depth maps is proposed. Based on the usually smooth structure and the sharp edges of depth map, our proposal segments the depth map into homogeneous regions of ar- bitrary shape and encodes the contents of these regions using dif- ferent texture coding strategies. An optimal lagrangian approach is applied to the hierarchical region representation provided by our segmentation technique. This approach automatically selects the best encoding strategy for each region and the optimal partition to encode the depth map. To avoid the high coding costs of coding the resulting partition, a prediction is made using the associated decoded color image

  • Variational reconstruction and restoration for video super-resolution

     Salvador, Jordi; Rivero, Daniel; Kochale, Axel; Ruiz Hidalgo, Javier
    International Conference on Pattern Recognition
    p. 1047-1051
    Presentation's date: 2012-11-13
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a variational framework for obtaining super-resolved video-sequences, based on the observation that reconstruction-based Super-Resolution (SR) algorithms are limited by two factors: registration exactitude and Point Spread Function (PSF) estimation accuracy. To minimize the impact of the first limiting factor, a small-scale linear in-painting algorithm is proposed to provide smooth SR video frames. To improve the second limiting factor, a fast PSF local estimation and total variation-based denoising is proposed. Experimental results reflect the improvements provided by the proposed method when compared to classic SR approaches. 2012 ICPR Org Committee.

  • INTAIRACT: Joint hand gesture and fingertip classification for touchless interaction

     Suau Cuadros, Xavier; Alcoverro Vidal, Marcel; Lopez Mendez, Adolfo; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    European Conference on Computer Vision
    p. 602-606
    DOI: 10.1007/978-3-642-33885-4_62
    Presentation's date: 2012-10-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this demo we present intAIRact, an online hand-based touchless interaction system. Interactions are based on easy-to-learn hand gestures, that combined with translations and rotations render a user friendly and highly configurable system. The main advantage with respect to existing approaches is that we are able to robustly locate and identify fingertips. Hence, we are able to employ a simple but powerful alphabet of gestures not only by determining the number of visible fingers in a gesture, but also which fingers are being observed. To achieve such a system we propose a novel method that jointly infers hand gestures and fingertip locations using a single depth image from a consumer depth camera. Our approach is based on a novel descriptor for depth data, the Oriented Radial Distribution (ORD) [1]. On the one hand, we exploit the ORD for robust classification of hand gestures by means of efficient k-NN retrieval. On the other hand, maxima of the ORD are used to perform structured inference of fingertip locations. The proposed method outperforms other state-of-the-art approaches both in gesture recognition and fingertip localization. An implementation of the ORD extraction on a GPU yields a real-time demo running at approximately 17fps on a single laptop

  • Access to the full text
    Oriented radial distribution on depth data: application to the detection of end-effectors  Open access

     Suau Cuadros, Xavier; Ruiz Hidalgo, Javier; Casas Pla, Josep Ramon
    IEEE International Conference on Acoustics, Speech, and Signal Processing
    Presentation's date: 2012-03-27
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    End-effectors are considered to be the main topological extremities of a given 3D body. Even if the nature of such body is not restricted, this paper focuses on the human body case. Detection of human extremities is a key issue in the human motion capture domain, being needed to initialize and update the tracker. Therefore, the effectiveness of human motion capture systems usually depends on the reliability of the obtained end-effectors. The increasing accuracy, low cost and easy installation of depth cameras has opened the door to new strategies to overcome the body pose estimation problem. With the objective of detecting the head, hands and feet of a human body, we propose a new local feature computed from depth data, which gives an idea of its curvature and prominence. Such feature is weighted depending on recent detections, providing also a temporal dimension. Based on this feature, some end-effector candidate blobs are obtained and classified into head, hands and feet according to three probabilistic descriptors.

  • Procesado de vídeo multicámara empleando información de la escena: aplicación a eventos deportivos, interacción visual y 3DTV

     Giro Nieto, Xavier; Vilaplana Besler, Veronica; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Pardas Feliu, Montserrat; Sayrol Clols, Elisa; Marques Acosta, Fernando; Salembier Clairon, Philippe Jean; Gasull Llampallas, Antoni; Oliveras Verges, Albert; Casas Pla, Josep Ramon
    Competitive project

     Share

  • AV renderer with arbitrary sparse loudspeaker setups & simple interactivity. Deliverable Fascinate D5.1.2

     Kochale, Axel; Borsum, M.; Abeling, S.; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Oldfield, Rob
    Date: 2011-09-21
    Report

     Share Reference managers Reference managers Open in new window

  • First system integration. Fascinate deliverable D1.5.1

     Schreer, Oliver; Feldmann, I.; Finn, A.; Spille, Jens; Kochale, Axel; Ruiz Hidalgo, Javier; Gibb, Andrew; Steurer, Johannes; Thomas, Graham; Oldfield, Rob; Shirley, Ben; Thaler, M.; Macq, Jean François; Rondao Alface, P.; Prins, M.J.; Matthew, S.; Niamut, Omar A.
    Date: 2011-10-31
    Report

     Share Reference managers Reference managers Open in new window

  • Metadata and knowledge models and tools. Fascinate deliverable D3.1.2

     Bailer, W; Kaiser, Rene; Poggi, A.; Macq, Jean François; Thomas, Graham; Kochale, Axel; Ruiz Hidalgo, Javier
    Date: 2011-12-02
    Report

     Share Reference managers Reference managers Open in new window

  • Multiview depth coding based on combined color/depth segmentation

     Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Aflaki, Payman; Calderero Patino, Felipe; Marques Acosta, Fernando
    Journal of visual communication and image representation
    Vol. 23, num. 1, p. 42-52
    DOI: 10.1016/j.jvcir.2011.08.001
    Date of publication: 2011-08-18
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, a new coding method for multiview depth video is presented. Considering the smooth structure and sharp edges of depth maps, a segmentation based approach is proposed. This allows further preserving the depth contours thus introducing fewer artifacts in the depth perception of the video. To reduce the cost associated with partition coding, an approximation of the depth partition is built using the decoded color view segmentation. This approximation is refined by sending some complementary information about the relevant differences between color and depth partitions. For coding the depth content of each region, a decomposition into orthogonal basis is used in this paper although similar decompositions may be also employed. Experimental results show that the proposed segmentation based depth coding method outperforms H.264/AVC and H.264/MVC by more than 2 dB at similar bitrates.

  • Real-time head and hand tracking based on 2.5D data

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    IEEE International Conference on Multimedia and Expo
    p. 1-6
    DOI: 10.1109/ICME.2011.6011869
    Presentation's date: 2011-07
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on 2.5D data from a range camera, which is exploited to resolve ambiguities and overlaps. Experimental results show high robustness against partial occlusions and fast movements. The estimated positions are fairly stable, allowing the extraction of accurate trajectories which may be used for gesture classification purposes.

  • Format-agnostic approach for production, delivery and rendering of immersive media

     Schreer, Oliver; Thomas, Graham; Niamut, Omar A.; Macq, Jean François; Kochale, Axel; Batke, J.M.; Ruiz Hidalgo, Javier; Oldfield, Rob; Shirley, Ben; Thallinger, Georg
    Networked and Electronic Media
    Presentation's date: 2011-09-27
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The media industry is currently being pulled in the often-opposing directions of increased realism (high resolution, stereoscopic, large screen) and personalisation (selection and control of content, availability on many devices). A capture, production, delivery and rendering system capable of supporting both these trends is being developed by the EU-funded FascinatE project. In this paper, different aspects of the format agnostic approach are discussed which we believe can be a promising concept for future media production and consumption. The different parts of the complete multimedia production and delivery process are revisited demonstrating the requirements and the potential of such an advanced concept.

  • Advanced visual rendering, gesture-based interaction and distributed delivery for immersive and interactive media services

     Niamut, Omar A.; Kochale, Axel; Ruiz Hidalgo, Javier; Macq, Jean François; Kienast, Gert
    International Broadcasting Convention
    p. 1-8
    Presentation's date: 2011-09-11
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Format-Agnostic SCript-based INterAcTive Experience

     Casas Pla, Josep Ramon; Morros Rubió, Josep Ramon; Marques Acosta, Fernando; Pardas Feliu, Montserrat; Ruiz Hidalgo, Javier
    Competitive project

     Share

  • Adquisición multicámara para Free Viewpoint Video (MC4FVV)

     Pardas Feliu, Montserrat; Giro Nieto, Xavier; Vilaplana Besler, Veronica; Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Salembier Clairon, Philippe Jean; Marques Acosta, Fernando; Gasull Llampallas, Antoni; Oliveras Verges, Albert; Sayrol Clols, Elisa; Casas Pla, Josep Ramon
    Competitive project

     Share

  • FascinatE: Format-Agnostic SCript-based INterAcTive Experience

     Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Morros Rubió, Josep Ramon; Pardas Feliu, Montserrat; Marques Acosta, Fernando
    Competitive project

     Share

  • FascinatE D5.1.1 AV renderer specification and basic characterisation of audience interaction

     Borsum, M.; Spille, Jens; Kochale, Axel; Önnevall, E.; Zoric, G.; Ruiz Hidalgo, Javier
    Date: 2010-09-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • FascinatE D1.4.1 Initial system specification

     Ruiz Hidalgo, Javier
    Date: 2010-11-01
    Report

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    FascinatE Newsletter 1  Open access

     Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Thallinger, Georg; Shirley, Ben
    Date: 2010-10-01
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This FascinatE newsletter explains how gesture recognition will be used in the FascinatE system, how our first test shoot went at a Premier League football match, and explains about up and coming events.

  • HESPERIA Homeland security: tecnologías para la seguridad integral en espacios públicos e infraestructuras CENIT-2005 Entregable 3.1.1 Revisión del estado del arte 2009

     Ruiz Hidalgo, Javier; Sainz, Félix; Albiol Colomer, Antonio; Albiol Colomer, Alberto; Morros Rubió, Josep Ramon
    Date: 2010-01-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • HESPERIA Homeland security: tecnologías para la seguridad integral en espacios públicos e infraestructuras CENIT-2005 Paquete de Trabajo 5, Actividad 5.2 E.5.2.1 Descripción del plan de pruebas

     Ruiz Hidalgo, Javier; Morros Rubió, Josep Ramon; Albiol Colomer, Antonio; Albiol Colomer, Alberto; Silla Martínez, María Julia; Sainz, Félix
    Date: 2010-02-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Best student paper award ICIP 2010

     Suau Cuadros, Xavier; Casas Pla, Josep Ramon; Ruiz Hidalgo, Javier
    Award or recognition

     Share