Visual multimedia have become an inseparable part of our digital social lives, and they often capture moments tied with deep affections. Automated visual sentiment analysis tools can provide a means of extracting the rich feelings and latent dispositions embedded in these media. In this work, we explore how Convolutional Neural Networks (CNNs), a now de facto computational machine learning tool particularly in the area of Computer Vision, can be specifically applied to the task of visual sentiment prediction. We accomplish this through fine-tuning experiments using a state-of-the-art CNN and via rigorous architecture analysis, we present several modifications that lead to accuracy improvements over prior art on a dataset of images from a popular social media platform. We additionally present visualizations of local patterns that the network learned to associate with image sentiment for insight into how visual positivity (or negativity) is perceived by the model.
In this paper we bring the tools of the Simultaneous Localization and Map Building (SLAM) problem from a rigid to a deformable domain and use them to simultaneously recover the 3D shape of non-rigid surfaces and the sequence of poses of a moving camera. Under the assumption that the surface shape may be represented as a weighted sum of deformation modes, we show that the problem of estimating the modal weights along with the camera poses, can be probabilistically formulated as a maximum a posteriori estimate and solved using an iterative least squares optimization. In addition, the probabilistic formulation we propose is very general and allows introducing different constraints without requiring any extra complexity. As a proof of concept, we show that local inextensibility constraints that prevent the surface from stretching can be easily integrated.
An extensive evaluation on synthetic and real data, demonstrates that our method has several advantages over current non-rigid shape from motion approaches. In particular, we show that our solution is robust to large amounts of noise and outliers and that it does not need to track points over the whole sequence nor to use an initialization close from the ground truth.
Advanced segmentation techniques in the surveillance domain deal with shadows to avoid distortions when detecting moving objects. Most approaches for shadow detection are still typically restricted to penumbra shadows and cannot cope well with umbra shadows. Consequently, umbra shadow regions are usually detected as part of moving objects, thus affecting the performance of the final detection. In this paper we address the detection of both penumbra and umbra shadow regions. First, a novel bottom-up approach is presented based on gradient and colour models, which successfully discriminates between chromatic moving cast shadow regions and those regions detected as moving objects. In essence, those regions corresponding to potential shadows are detected based on edge partitioning and colour statistics. Subsequently (i) temporal similarities between textures and (ii) spatial similarities between chrominance angle and brightness distortions are analysed for each potential shadow region for detecting the umbra shadow regions. Our second contribution refines even further the segmentation results: a tracking-based top-down approach increases the performance of our bottom-up chromatic shadow detection algorithm by properly correcting non-detected shadows. To do so, a combination of motion filters in a data association framework exploits the temporal consistency between objects and shadows to increase the shadow detection rate. Experimental results exceed current state-of-the-art in shadow accuracy for multiple well-known surveillance image databases which contain different shadowed materials and illumination conditions.
Suau, X.; Alcoverro, M.; López-Méndez, A.; Ruiz-Hidalgo, J.; Casas, J. Image and vision computing Vol. 32, num. 8, p. 522-532 DOI: 10.1016/j.imavis.2014.04.015 Data de publicació: 2014-05-09 Article en revista
A method to obtain accurate hand gesture classification and fingertip localization from depth images is proposed. The Oriented Radial Distribution feature is utilized, exploiting its ability to globally describe hand poses, but also to locally detect fingertip positions. Hence, hand gesture and fingertip locations are characterized with a single feature calculation. We propose to divide the difficult problem of locating fingertips into two more tractable problems, by taking advantage of hand gesture as an auxiliary variable. Along with the method we present the ColorTip dataset, a dataset for hand gesture recognition and fingertip classification using depth data. ColorTip contains sequences where actors wear a glove with with colored fingertips, allowing automatic annotation. The proposed method is evaluated against recent works in several datasets, achieving promising results in both gesture classification and fingertip localization.
An algorithm to estimate camera motion from the progressive deformation of a tracked contour in the acquired video stream has been previously proposed. It relies on the fact that two views of a plane are related by an affinity, whose six parameters can be used to derive the six degrees-of-freedom of camera motion between the two views. In this paper we evaluate the accuracy of the algorithm. Monte Carlo simulations show that translations parallel to the image plane and rotations about the optical axis are better recovered than translations along this axis, which in turn are more accurate than rotations out of the plane. Concerning covariances, only the three less precise degrees-of-freedom appear to be correlated. In order to obtain means and covariances of 3D motions quickly on a working robot system, we resort to the Unscented Transformation (UT) requiring only 13 samples per view, after validating its usage through the previous Monte Carlo simulations. Two sets of experiments have been performed: short-range motion recovery has been tested using a Staübli robot arm in a controlled lab setting, while the precision of the algorithm when facing long translations has been assessed by means of a vehicle-mounted camera in a factory floor. In the latter more unfavourable case, the obtained errors are around 3%, which seems accurate enough for transferring operations
In this paper, a Cognitive Vision System (CVS) is presented, which explains the human behaviour of monitored scenes using naturallanguage texts. This cognitive analysis of human movements recorded in image sequences is here referred to as Human Sequence Evaluation (HSE) which defines a set of transformation modules involved in the automatic generation of semantic descriptions from pixel values. In essence, the trajectories of human agents are obtained to generate textual interpretations of their motion, and also to infer
the conceptual relationships of each agent w.r.t. its environment. For this purpose, a human behaviour model based on Situation Graph Trees (SGTs) is considered, which permits both bottom-up (hypothesis generation) and top-down (hypothesis refinement) analysis of dynamic scenes. The resulting system prototype interprets different kinds of behaviour and reports textual descriptions in multiple
In this paper we propose a new technique to perform figure-ground segmentation in image sequences of moving objects under varying illumination conditions. Unlike most of the algorithms that adapt color, there is not the assumption of smooth change of the viewing conditions. To cope with this, we propose the use of a new colorspace that maximizes the foreground/background class separability based on the 'Linear Discriminant Analysis' method. Moreover, we introduce a technique that formulates multiple hypotheses about the next state of the color distribution (some of these hypotheses take into account small and gradual changes in the color model and others consider more abrupt and unexpected variations) and the hypothesis that generates the best object segmentation is used to remove noisy edges from the image. This simplifies considerably the final step of fitting a deformable contour to the object boundary, thus allowing a standard snake formulation to successfully track nonrigid contours. In the same manner, the contour estimate is used to correct the color model. The integration of color and shape is done in a stage called 'sample concentration', introduced as a final step to the well-known condensation algorithm.