Loading...
Loading...

Go to the content (press return)

Distributed training of deep neural networks with spark: The MareNostrum experience

Author
Cruz, L.; Tous, R.; Otero, B.
Type of activity
Journal article
Journal
Pattern recognition letters
Date of publication
2019-07-01
Volume
125
First page
174
Last page
178
DOI
10.1016/j.patrec.2019.01.020
Project funding
Computación de Altas Prestaciones VII
Models de Programacio i Entorns d'eXecució PARal.lels
Repository
http://hdl.handle.net/2117/169362 Open in new window
URL
https://www.sciencedirect.com/science/article/abs/pii/S0167865519300145 Open in new window
Abstract
Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. The details of such kind of deployments are rarely described in the literature. This paper presents the experiences observed during the deployment of a technology stack to enable deep learning workloads on MareNostrum, a petascale supercomputer. The components of a layered archi...
Keywords
DL4J, Deep Learning, HPC, MareNostrum, Performance, Scalability, Spark
Group of research
CAP - High Performace Computing Group
VIRTUOS - Virtualisation and Operating Systems

Participants