Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance

Autores
Morán, Marina; Balladini, Javier; Rexachs del Rosario, Dolores; Rucci, Enzo
Año de publicación
2022
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Inquiring about different ways to reduce energy consumption during the execution of large-scale applications is essential to maintain and increase the enormous computing power achieved in HPC systems. Fault tolerance methods can have an impact on power consumption. In particular, rollback-recovery methods using uncoordinated checkpoints prevent all processes from re-executing in the event of a failure. In this context, it is possible to take actions on the nodes of the processes that do not re-execute to reduce energy consumption. In this work, we describe some issues to consider when we extend the application of energy-saving strategies beyond the nodes that communicate directly with the failed one.
Instituto de Investigación en Informática
Materia
Ciencias Informáticas
Energy consumption
Fault tolerance
Uncoordinated checkpoints
HPC
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/140642

id SEDICI_d4788ae6a0fc0a7e988fd09272fd0a6a
oai_identifier_str oai:sedici.unlp.edu.ar:10915/140642
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault ToleranceMorán, MarinaBalladini, JavierRexachs del Rosario, DoloresRucci, EnzoCiencias InformáticasEnergy consumptionFault toleranceUncoordinated checkpointsHPCInquiring about different ways to reduce energy consumption during the execution of large-scale applications is essential to maintain and increase the enormous computing power achieved in HPC systems. Fault tolerance methods can have an impact on power consumption. In particular, rollback-recovery methods using uncoordinated checkpoints prevent all processes from re-executing in the event of a failure. In this context, it is possible to take actions on the nodes of the processes that do not re-execute to reduce energy consumption. In this work, we describe some issues to consider when we extend the application of energy-saving strategies beyond the nodes that communicate directly with the failed one.Instituto de Investigación en Informática2022-07info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf17-22http://sedici.unlp.edu.ar/handle/10915/140642enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0info:eu-repo/semantics/reference/hdl/10915/139373info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-17T10:18:23Zoai:sedici.unlp.edu.ar:10915/140642Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-17 10:18:23.902SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
title Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
spellingShingle Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
Morán, Marina
Ciencias Informáticas
Energy consumption
Fault tolerance
Uncoordinated checkpoints
HPC
title_short Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
title_full Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
title_fullStr Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
title_full_unstemmed Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
title_sort Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
dc.creator.none.fl_str_mv Morán, Marina
Balladini, Javier
Rexachs del Rosario, Dolores
Rucci, Enzo
author Morán, Marina
author_facet Morán, Marina
Balladini, Javier
Rexachs del Rosario, Dolores
Rucci, Enzo
author_role author
author2 Balladini, Javier
Rexachs del Rosario, Dolores
Rucci, Enzo
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Energy consumption
Fault tolerance
Uncoordinated checkpoints
HPC
topic Ciencias Informáticas
Energy consumption
Fault tolerance
Uncoordinated checkpoints
HPC
dc.description.none.fl_txt_mv Inquiring about different ways to reduce energy consumption during the execution of large-scale applications is essential to maintain and increase the enormous computing power achieved in HPC systems. Fault tolerance methods can have an impact on power consumption. In particular, rollback-recovery methods using uncoordinated checkpoints prevent all processes from re-executing in the event of a failure. In this context, it is possible to take actions on the nodes of the processes that do not re-execute to reduce energy consumption. In this work, we describe some issues to consider when we extend the application of energy-saving strategies beyond the nodes that communicate directly with the failed one.
Instituto de Investigación en Informática
description Inquiring about different ways to reduce energy consumption during the execution of large-scale applications is essential to maintain and increase the enormous computing power achieved in HPC systems. Fault tolerance methods can have an impact on power consumption. In particular, rollback-recovery methods using uncoordinated checkpoints prevent all processes from re-executing in the event of a failure. In this context, it is possible to take actions on the nodes of the processes that do not re-execute to reduce energy consumption. In this work, we describe some issues to consider when we extend the application of energy-saving strategies beyond the nodes that communicate directly with the failed one.
publishDate 2022
dc.date.none.fl_str_mv 2022-07
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/140642
url http://sedici.unlp.edu.ar/handle/10915/140642
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0
info:eu-repo/semantics/reference/hdl/10915/139373
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
17-22
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1843532859143356416
score 13.001348