H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments

Autores
Royo, Ambrosio; Villamayor, Jorge; Castro-León, Marcela; Rexachs del Rosario, Dolores; Luque Fadón, Emilio
Año de publicación
2018
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.
Facultad de Informática
Materia
Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/69674

id SEDICI_0e1edc4ae701acd7bad983378c3875e4
oai_identifier_str oai:sedici.unlp.edu.ar:10915/69674
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud EnvironmentsRoyo, AmbrosioVillamayor, JorgeCastro-León, MarcelaRexachs del Rosario, DoloresLuque Fadón, EmilioCiencias Informáticascloud computingcloud, fault-tolerance, high-performance computing, RADICEven though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.Facultad de Informática2018-06info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf7-13http://sedici.unlp.edu.ar/handle/10915/69674enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-1659-4info:eu-repo/semantics/reference/hdl/10915/69464info:eu-repo/semantics/reference/hdl/10915/71655info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:43:01Zoai:sedici.unlp.edu.ar:10915/69674Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:43:01.491SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
spellingShingle H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
Royo, Ambrosio
Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
title_short H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_full H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_fullStr H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_full_unstemmed H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_sort H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
dc.creator.none.fl_str_mv Royo, Ambrosio
Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author Royo, Ambrosio
author_facet Royo, Ambrosio
Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author_role author
author2 Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author2_role author
author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
topic Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
dc.description.none.fl_txt_mv Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.
Facultad de Informática
description Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.
publishDate 2018
dc.date.none.fl_str_mv 2018-06
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/69674
url http://sedici.unlp.edu.ar/handle/10915/69674
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-1659-4
info:eu-repo/semantics/reference/hdl/10915/69464
info:eu-repo/semantics/reference/hdl/10915/71655
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
7-13
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260300188876800
score 13.13397