The combination of visual and inertial sensors for state estimation has recently found wide echo in the robotics community, especially in the aerial robotics field, due to the lightweight and complementary characteristics of the sensors data. However, most state estimation systems based on visual-inertial sensing suffer from severe processor requirements, which in many cases make them impractical. In this paper, we propose a simple, low-cost and high rate method for state estimation enabling autonomous flight of micro aerial vehicles, which presents a low computational burden. The proposed state estimator fuses observations from an inertial measurement unit, an optical flow smart camera and a time-of-flight range sensor. The smart camera provides optical flow measurements up to a rate of 200 Hz, avoiding the computational bottleneck to the main processor produced by all image processing requirements. To the best of our knowledge, this is the first example of extending the use of these smart cameras from hovering-like motions to odometry estimation, producing estimates that are usable during flight times of several minutes. In order to validate and defend the simplest algorithmic solution, we investigate the performances of two Kalman filters, in the extended and error-state flavors, alongside with a large number of algorithm modifications defended in earlier literature on visual-inertial odometry, showing that their impact on filter performance is minimal. To close the control loop, a non-linear controller operating in the special Euclidean group SE(3) is able to drive, based on the estimated vehicle’s state, a quadrotor platform in 3D space guaranteeing the asymptotic stability of 3D position and heading. All the estimation and control tasks are solved on board and in real time on a limited computational unit. The proposed approach is validated through simulations and experimental results, which include comparisons with ground-truth data provided by a motion capture system. For the benefit of the community, we make the source code public.
The final publication is available at link.springer.com
La robòtica industrial està donant pas a una robòtica de caire més social i assistencial. Simplificant, podríem dir que mentre els robots industrials han ocupat llocs de treball que requereixen manipular materials rígids i pesants, feines considerades masculines, els robots assistencials s’estan desenvolupant per fer tasques tradicionalment realitzades per dones, i les seves prestacions prioritàries ja no són la força i la precisió, sinó la capacitat de comunicar-se i col·laborar, de manipular objectes tous com roba, aliments i les persones mateixes, i d’adaptar-se a situacions canviants. En aquest context, l'èmfasi en les capacitats cognitives i d'interacció entre persones i robots propicia la confluència amb les humanitats i, en particular, amb la psicologia, l'ètica i la literatura.
In this paper we are interested in recognizing human actions from sequences of 3D skeleton data. For this purpose we combine a 3D Convolutional Neural Network with body representations based on Euclidean Distance Matrices (EDMs), which have been recently shown to be very effective to capture the geometric structure of the human pose. One inherent limitation of the EDMs, however, is that they are defined up to a permutation of the skeleton joints, i.e., randomly shuffling the ordering of the joints yields many different representations. In oder to address this issue we introduce a novel architecture that simultaneously, and in an end-to-end manner, learns an optimal transformation of the joints, while optimizing the rest of parameters of the convolutional network. The proposed approach achieves state-of-the-art results on 3 benchmarks, including the recent NTU RGB-D dataset, for which we improve on previous LSTM-based methods by more than 10 percentage points, also surpassing other CNN-based methods while using almost 1000 times fewer parameters.
Porzi, L.; Peñate, A.; Ricci, E.; Moreno-Noguer, F. IEEE/RSJ International Conference on Intelligent Robots and Systems p. 5777-5783 DOI: 10.1109/IROS.2017.8206469 Data de presentació: 2017 Presentació treball a congrés
Most recent approaches to 3D pose estimation from RGB-D images address the problem in a two-stage pipeline. First, they learn a classifier –typically a random forest– to predict the position of each input pixel on the object surface. These estimates are then used to define an energy function that is minimized w.r.t. the object pose. In this paper, we focus on the first stage of the problem and propose a novel classifier based on a depth-aware Convolutional Neural Network. This classifier is able to learn a scale-adaptive regression model that yields very accurate pixel-level predictions, allowing to finally estimate the pose using a simple RANSAC-based scheme, with no need to optimize complex ad hoc energy functions. Our experiments on publicly available datasets show that our approach achieves remarkable improvements over state-of-the-art methods.
Vaquero, V.; del Pino, I.; Moreno-Noguer, F.; Solá, J.; Sanfeliu, A.; Andrade-Cetto, J. European Conference on Mobile Robots p. 1-7 DOI: 10.1109/ECMR.2017.8098657 Data de presentació: 2017 Presentació treball a congrés
Vehicle detection and tracking is a core ingredient for developing autonomous driving applications in urban scenarios. Recent image-based Deep Learning (DL) techniques are obtaining breakthrough results in these perceptive tasks. However, DL research has not yet advanced much towards processing 3D point clouds from lidar range-finders. These sensors are very common in autonomous vehicles since, despite not providing as semantically rich information as images, their performance is more robust under harsh weather conditions than vision sensors. In this paper we present a full vehicle detection and tracking system that works with 3D lidar information only. Our detection step uses a Convolutional Neural Network (CNN) that receives as input a featured representation of the 3D information provided by a Velodyne HDL-64 sensor and returns a per-point classification of whether it belongs to a vehicle or not. The classified point cloud is then geometrically processed to generate observations for a multi-object tracking system implemented via a number of Multi-Hypothesis Extended Kalman Filters (MH-EKF) that estimate the position and velocity of the surrounding vehicles. The system is thoroughly evaluated on the KITTI tracking dataset, and we show the performance boost provided by our CNN-based vehicle detector over a standard geometric approach. Our lidar-based approach uses about a 4% of the data needed for an image-based detector with similarly competitive results.
del Pino, I.; Vaquero, V.; Massini, B.; Solá, J.; Moreno-Noguer, F.; Sanfeliu, A.; Andrade-Cetto, J. Iberian Robotics Conference p. 287-298 DOI: 10.1007/978-3-319-70833-1_24 Data de presentació: 2017 Presentació treball a congrés
Vehicle detection and tracking in real scenarios are key com- ponents to develop assisted and autonomous driving systems. Lidar sen- sors are specially suitable for this task, as they bring robustness to harsh weather conditions while providing accurate spatial information. How- ever, the resolution provided by point cloud data is very scarce in com- parison to camera images. In this work we explore the possibilities of Deep Learning (DL) methodologies applied to low resolution 3D lidar sensors such as the Velodyne VLP-16 (PUCK), in the context of vehicle detection and tracking. For this purpose we developed a lidar-based sys- tem that uses a Convolutional Neural Network (CNN), to perform point- wise vehicle detection using PUCK data, and Multi-Hypothesis Extended Kalman Filters (MH-EKF), to estimate the actual position and veloci- ties of the detected vehicles. Comparative studies between the proposed lower resolution (VLP-16) tracking system and a high-end system, using Velodyne HDL-64, were carried out on the Kitti Tracking Benchmark dataset. Moreover, to analyze the influence of the CNN-based vehicle detection approach, comparisons were also performed with respect to the geometric-only detector. The results demonstrate that the proposed low resolution Deep Learning architecture is able to successfully accom- plish the vehicle detection task, outperforming the geometric baseline approach. Moreover, it has been observed that our system achieves a similar tracking performance to the high-end HDL-64 sensor at close range. On the other hand, at long range, detection is limited to half the distance of the higher-end sensor.
The final publication is available at link.springer.com