Towards Management of Energy Consumption in HPC Systems with Fault Tolerance
- Autores
- Morán, Marina; Balladini, Javier; Rexachs del Rosario, Dolores; Rucci, Enzo
- Año de publicación
- 2020
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure.
Instituto de Investigación en Informática
Comisión de Investigaciones Científicas de la provincia de Buenos Aires - Materia
-
Informática
Energy consumption
energy saving
Power management
Fault tolerance
uncoordinated checkpoint
HPC
Distributed memory
MPI
DVFS
ACPI - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/139146
Ver los metadatos del registro completo
id |
SEDICI_46813f5cbd85bf4d530af115d180d574 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/139146 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Towards Management of Energy Consumption in HPC Systems with Fault ToleranceMorán, MarinaBalladini, JavierRexachs del Rosario, DoloresRucci, EnzoInformáticaEnergy consumptionenergy savingPower managementFault toleranceuncoordinated checkpointHPCDistributed memoryMPIDVFSACPIHigh-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure.Instituto de Investigación en InformáticaComisión de Investigaciones Científicas de la provincia de Buenos Aires2020info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/139146enginfo:eu-repo/semantics/altIdentifier/isbn/978-1-7281-5957-7info:eu-repo/semantics/altIdentifier/doi/10.1109/argencon49523.2020.9505498info:eu-repo/semantics/altIdentifier/arxiv/2012.11396info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T11:23:43Zoai:sedici.unlp.edu.ar:10915/139146Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 11:23:44.138SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
title |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
spellingShingle |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance Morán, Marina Informática Energy consumption energy saving Power management Fault tolerance uncoordinated checkpoint HPC Distributed memory MPI DVFS ACPI |
title_short |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
title_full |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
title_fullStr |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
title_full_unstemmed |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
title_sort |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
dc.creator.none.fl_str_mv |
Morán, Marina Balladini, Javier Rexachs del Rosario, Dolores Rucci, Enzo |
author |
Morán, Marina |
author_facet |
Morán, Marina Balladini, Javier Rexachs del Rosario, Dolores Rucci, Enzo |
author_role |
author |
author2 |
Balladini, Javier Rexachs del Rosario, Dolores Rucci, Enzo |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Informática Energy consumption energy saving Power management Fault tolerance uncoordinated checkpoint HPC Distributed memory MPI DVFS ACPI |
topic |
Informática Energy consumption energy saving Power management Fault tolerance uncoordinated checkpoint HPC Distributed memory MPI DVFS ACPI |
dc.description.none.fl_txt_mv |
High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure. Instituto de Investigación en Informática Comisión de Investigaciones Científicas de la provincia de Buenos Aires |
description |
High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/139146 |
url |
http://sedici.unlp.edu.ar/handle/10915/139146 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-1-7281-5957-7 info:eu-repo/semantics/altIdentifier/doi/10.1109/argencon49523.2020.9505498 info:eu-repo/semantics/altIdentifier/arxiv/2012.11396 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1846064292269391872 |
score |
13.22299 |