Agudo, A.; Moreno-Noguer, F.; Calvo, B.; Montiel, J.M.M. IEEE transactions on pattern analysis and machine intelligence Vol. 38, num. 5, p. 979-994 DOI: 10.1109/TPAMI.2015.2469293 Data de publicació: 2016-05-01 Article en revista
We propose a new approach to simultaneously recover camera pose and 3D shape of non-rigid and potentially extensible surfaces from a monocular image sequence. For this purpose, we make use of the Extended Kalman Filter based Simultaneous Localization And Mapping (EKF-SLAM) formulation, a Bayesian optimization framework traditionally used in mobile robotics for estimating camera pose and reconstructing rigid scenarios. In order to extend the problem to a deformable domain we represent the object's surface mechanics by means of Navier's equations, which are solved using a Finite Element Method (FEM). With these main ingredients, we can further model the material's stretching, allowing us to go a step further than most of current techniques, typically constrained to surfaces undergoing isometric deformations. We extensively validate our approach in both real and synthetic experiments, and demonstrate its advantages with respect to competing methods. More specifically, we show that besides simultaneously retrieving camera pose and non-rigid shape, our approach is adequate for both isometric and extensible surfaces, does not require neither batch processing all the frames nor tracking points over the whole sequence and runs at several frames per second.
This paper describes a real-time sequential method to simultaneously recover the camera motion and the 3D shape of deformable objects from a calibrated monocular video. For this purpose, we consider the Navier-Cauchy equations used in 3D linear elasticity and solved by finite elements, to model the time-varying shape per frame. These equations are embedded in an extended Kalman filter, resulting in sequential Bayesian estimation approach. We represent the shape, with unknown material properties, as a combination of elastic elements whose nodal points correspond to salient points in the image. The global rigidity of the shape is encoded by a stiffness matrix, computed after assembling each of these elements. With this piecewise model, we can linearly relate the 3D displacements with the 3D acting forces that cause the object deformation, assumed to be normally distributed. While standard finite-element-method techniques require imposing boundary conditions to solve the resulting linear system, in this work we eliminate this requirement by modeling the compliance matrix with a generalized pseudoinverse that enforces a pre-fixed rank. Our framework also ensures surface continuity without the need for a post-processing step to stitch all the piecewise reconstructions into a global smooth shape. We present experimental results using both synthetic and real videos for different scenarios ranging from isometric to elastic deformations. We also show the consistency of the estimation with respect to 3D ground truth data, include several experiments assessing robustness against artifacts and finally, provide an experimental validation of our performance in real time at frame rate for small maps
In recent years, there has been a growing interest on tackling the Non-Rigid Structure from Motion problem (NRSfM), where the shape of a deformable object and the pose of a moving camera are simultaneously estimated from a monocular video sequence. Existing solutions are limited to single objects and continuous, smoothly changing sequences. In this paper we extend NRSfM to a multi-instance domain, in which the images do not need to have temporal consistency, allowing for instance, to jointly reconstruct the face of multiple persons from an unordered list of images. For this purpose, we present a new formulation of the problem based on a dual low-rank shape representation, that simultaneously captures the between- and within-individual deformations. The parameters of this model are learned using a variant of the probabilistic linear discriminant analysis that requires consecutive batches of expectation and maximization steps. The resulting approach estimates 3D deformable shape and pose of multiple instances from only 2D point observations on a collection images, without requiring pre-trained 3D data, and is shown to be robust to noisy measurements and missing points. We provide quantitative and qualitative evaluation on both synthetic and real data, and show consistent benefits compared to current state of the art.
Agudo, A.; Montiel, J.M.M.; Calvo, B.; Moreno-Noguer, F. IEEE Winter Conference on Applications of Computer Vision p. 1 DOI: 10.1109/WACV.2016.7477725 Data de presentació: 2016 Presentació treball a congrés
This paper describes an on-line approach for estimating non-rigid shape and camera pose from monocular video sequences. We assume an initial estimate of the shape at rest to be given and represented by a triangulated mesh, which is encoded by a matrix of the distances between every pair of vertexes. By applying spectral analysis on this matrix, we are then able to compute a low-dimensional shape basis, that in contrast to standard approaches, has a very direct physical interpretation and requires a much smaller number of modes to span a large variety of deformations, either for inextensible or extensible configurations. Based on this low-rank model, we then sequentially retrieve both camera motion and non-rigid shape in each image, optimizing the model parameters with bundle adjustment over a sliding window of image frames. Since the number of these parameters is small, specially when considering physical priors, our approach may potentially achieve real-time performance. Experimental results on real videos for different scenarios demonstrate remarkable robustness to artifacts such as missing and noisy observations.
In this paper, we propose a sequential solution to simultaneously estimate camera pose and non-rigid 3D shape from a monocular video. In contrast to most existing approaches that rely on global representations of the shape, we model the object at a local level, as an ensemble of particles, each ruled by the linear equation of the Newton's second law of motion. This dynamic model is incorporated into a bundle adjustment framework, in combination with simple regularization components that ensure temporal and spatial consistency of the estimated shape and camera poses. The resulting approach is both efficient and robust to several artifacts such as noisy and missing data or sudden camera motions, while it does not require any training data at all. Validation is done in a variety of real video sequences, including articulated and non-rigid motion, both for continuous and discontinuous shapes. Our system is shown to perform comparable to competing batch, computationally expensive, methods and shows remarkable improvement with respect to the sequential ones.
In this paper, we address the problem of simultaneously recovering the 3D shape and pose of a deformable and potentially elastic object from 2D motion. This is a highly ambiguous problem typically tackled by using low-rank shape and trajectory constraints. We show that formulating the problem in terms of a low-rank force space that induces the deformation, allows for a better physical interpretation of the resulting priors and a more accurate representation of the actual object's behavior. However, this comes at the price of, besides force and pose, having to estimate the elastic model of the object. For this, we use an Expectation Maximization strategy, where each of these parameters are successively learned within partial M-steps, while robustly dealing with missing observations. We thoroughly validate the approach on both mocap and real sequences, showing more accurate 3D reconstructions than state-of-the-art, and additionally providing an estimate of the full elastic model with no a priori information.