Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.
Raventos, A.; Quijada, R.; Torres, L.; Tarres, F.; Carasusan, E.; farre, D. International Multi-Conference on Systems, Signals and Devices p. 1-6 DOI: 10.1109/SSD.2014.6808845 Data de presentació: 2014-02 Presentació treball a congrés
Automatic generation of sports highlights from recorded audiovisual content has been object of great interest in recent years. The problem is indeed important in the production of second and third division leagues highlights videos where the quantity of raw material is significant and does not contain manual annotations. Many approaches are mostly based on the analysis of the video and disregard the important information provided by the audio track. In this paper, a new approach that combines audio and video descriptors for automatic soccer highlights generation is proposed. The approach is based on the segmentation of the video contents into shots that are further analyzed in order to determine its relevance and interest. These video-shots are scored taking into account the fusion between different audio and video features. The paper is mainly focused to emphasize the importance of audio detectors that play a key role in the analysis and scoring of the video-shots. Specifically, a new algorithm for referee's whistle detection is proposed. The algorithm has been proven to be very robust and efficiently discriminates professional whistles against other types of noises such as public cheering-up, music instruments, etc. Several results have been produced using real soccer video sequences that prove the validity of the proposed audio and video fusion scheme.
En este entregable se identifican todos los demostradores de software implementados hasta la fecha. De cada demostrador se especifica su localización en el SVN y se detalla un breve manual para una correcta compresión de sus funciones y utilización.
En este entregable se recoge una visión general de los descriptores de vídeo MPEG-7, tanto de los seleccionados para su implementación en el documento E1.3.1, como los que conforman el resto del conjunto del estándar.
En este entregable se especifican las diferentes metodologías para el análisis del sonido ambiente y la locución de eventos deportivos, así como la extracción de los metadatos que los identifican. Como resultado, se dispondrá de un conjunto reducido de metadatos que se anotarán de forma automática y cuyos algoritmos serán diseñados, realizados y evaluados en esta actividad.
En este entregable se recoge el estado del arte de la tecnología para la anotación automática de metadatos y su uso para la elaboración de resúmenes de secuencias de vídeo. El objetivo es disponer de un informe técnico que pueda utilizarse como hoja de ruta durante la ejecución del proyecto, y que se utilizará como base tecnológica para los desarrollos enfocados a la generación automática de contenidos.
Malinowski, S.; Artigas, J.; Guillemot, C.; Torres, L. IEEE transactions on signal processing Vol. 57, num. 10, p. 4154-4158 DOI: 10.1109/TSP.2009.2023359 Data de publicació: 2009-10-01 Article en revista
This correspondence considers the use of punctured
quasi-arithmetic (QA) codes for the Slepian–Wolf problem. These
entropy codes are defined by finite state machines for memoryless and
first-order memory sources. Puncturing an entropy coded bit-stream leads
to an ambiguity at the decoder side. The decoder makes use of a correlated
version of the original message in order to remove this ambiguity. A
complete distributed source coding (DSC) scheme based on QA encoding
with side information at the decoder is presented, together with iterative
structures based on QA codes. The proposed schemes are adapted to
memoryless and first-order memory sources. Simulation results reveal
that the proposed schemes are efficient in terms of decoding performance
for short sequences compared to well-known DSC solutions using channel
Guillemot, C.; Pereira, F.; Torres, L.; Ebrahimi, T.; Leonardi, R.; Ostermann, J. IEEE signal processing magazine Vol. 24, num. 5, p. 67-76 DOI: 10.1109/MSP.2007.904808 Data de publicació: 2007-10-15 Article en revista
Growing percentage of the world population now uses image and video coding technologies on a regular basis. These technologies are behind the success and quick deployment of services and products such as digital pictures, digital television, DVDs, and Internet video communications. Today's digital video coding paradigm represented by the ITU-T and MPEG standards mainly relies on a hybrid of block- based transform and interframe predictive coding approaches. In this coding framework, the encoder architecture has the task to exploit both the temporal and spatial redundancies present in the video sequence, which is a rather complex exercise. As a consequence, all standard video encoders have a much higher computational complexity than the decoder (typically five to ten times more complex), mainly due to the temporal correlation exploitation tools, notably the motion estimation process. This type of architecture is well-suited for applications where the video is encoded once and decoded many times, i.e., one-to-many topologies, such as broadcasting or video-on-demand, where the cost of the decoder is more critical than the cost of the encoder.