Loading...
Loading...

Go to the content (press return)

Extending the OpenCHK Model with advanced checkpoint features

Author
Maroñas, M.; Mateo, S.; Keller, K.; Bautista Gomez, Leonardo Arturo; Ayguade, E.; Beltran, V.
Type of activity
Journal article
Journal
Future generation computer systems
Date of publication
2020-11
Volume
112
First page
738
Last page
750
DOI
10.1016/j.future.2020.06.003
Project funding
High performance computing VII
Models de Programacio i Entorns d'eXecució PARal.lels
Repository
http://hdl.handle.net/2117/192216 Open in new window
https://arxiv.org/abs/2006.16616 Open in new window
URL
https://www.sciencedirect.com/science/article/pii/S0167739X20304908 Open in new window
Abstract
One of the major challenges in using extreme scale systems efficiently is to mitigate the impact of faults. Application-level checkpoint/restart (CR) methods provide the best trade-off between productivity, robustness, and performance. There are many solutions implementing CR at the application level. They all provide advanced I/O capabilities to minimize the overhead introduced by CR. Nevertheless, there is still room for improvement in terms of programmability and flexibility, because end-user...
Citation
Maroñas, M. [et al.]. Extending the OpenCHK Model with advanced checkpoint features. "Future generation computer systems", Novembre 2020, vol. 112, p. 738-750.
Keywords
Checkpoint/restart methods
Group of research
CAP - High Performace Computing Group

Participants

  • Maroñas Bravo, Marcos  (author)
  • Mateo Bellido, Sergi  (author)
  • Keller, Kai Rasmus  (author)
  • Bautista Gomez, Leonardo Arturo  (author)
  • Ayguade Parra, Eduard  (author)
  • Beltran Querol, Vicenç  (author)