Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Urbano-Márquez, A.; Beléndez, A. Computer physics communications Vol. 191, p. 43-51 DOI: 10.1016/j.cpc.2015.01.017 Data de publicació: 2015-06-01 Article en revista
The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bidimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
In the last years the force matching algorithm has appeared to be a promising method for deriving next generation classical force fields. Recently it has been successfully utilized to parameterize new
water models based on pairwise potentials. In this contribution we will present a refined version of the method which has been applied to parameterize a new force field for flexible water. Static and dynamical
properties of the new water model reproduce fairly well the ab initio results. This enriched version of the method could possibly be applied to produce a fully flexible force field for any molecular systems where intramolecular motion plays a central role.
We present new software for the retrieval of the volume distribution –and thus, other relevant microphysical properties such as the effective radius– of stratospheric and tropospheric aerosols from multiwavelength LIDAR data. We consider the basic equation as a linear ill-posed problem and solve the linear system derived from spline collocation. We consider as well the technical implications of the algorithm implementation. In order to reduce runtime which is incurred by the vast theoretical search space, experiments on the MareNostrum Supercomputer were made to understand the significance of the different search space dimensions on the quality of the solution with the goal of restricting or eliminating entirely certain dimensions of the search space, to massively reduce calculation time for later production runs. Results show that the search space can be reduced according to available computation power to still yield reasonable results. Also, the scalability of the parallel software proved to be good.
We present computer simulations of a tip-tilt adaptive optics system, where stochastic optimization is applied to the problemof dynamic compensation of atmospheric turbulence. The system uses a simple measure of the light intensity that passes through a mask and is recorded on the image plane, to generate signals for the tip-tilt mirror. A feedback system rotates the mirror adaptively and in phase with the rapidly changing atmospheric conditions. Computer simulations and a series of
numerical experiments investigate the implementation of the method in the presence of drifting atmosphere. In particular, the
study examines the system’s sensitivity to the rate of change of the atmospheric conditions and investigates the optimal size of the mirror’s masking area and the algorithm’s optimal degree of stochasticity.