High availability for parallel computers

Autores
Rexachs del Rosario, Dolores; Luque Fadón, Emilio
Año de publicación
2010
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant architecture with different protection levels, offering high availability with transparency, decentralization, flexibility and scalability for message-passing systems. Transient faults may cause an application running in a computer system to be removed from execution, however the biggest risk of transient faults is to provoke undetected data corruption that changes the final result of the application without anyone knowing. To evaluate the effects of transient faults in the robustness of applications and validate new fault detection mechanism and strategies, we have developed a full-system simulation fault injection environment
Facultad de Informática
Materia
Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/9677

id SEDICI_b494e10aece288a14449b4ed1be01a0d
oai_identifier_str oai:sedici.unlp.edu.ar:10915/9677
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling High availability for parallel computersRexachs del Rosario, DoloresLuque Fadón, EmilioCiencias InformáticasFault toleranceReliability, availability, and serviceabilityFault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant architecture with different protection levels, offering high availability with transparency, decentralization, flexibility and scalability for message-passing systems. Transient faults may cause an application running in a computer system to be removed from execution, however the biggest risk of transient faults is to provoke undetected data corruption that changes the final result of the application without anyone knowing. To evaluate the effects of transient faults in the robustness of applications and validate new fault detection mechanism and strategies, we have developed a full-system simulation fault injection environmentFacultad de Informática2010-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf110-116http://sedici.unlp.edu.ar/handle/10915/9677enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct10-1.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc/3.0/Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:50:45Zoai:sedici.unlp.edu.ar:10915/9677Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:50:45.286SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv High availability for parallel computers
title High availability for parallel computers
spellingShingle High availability for parallel computers
Rexachs del Rosario, Dolores
Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
title_short High availability for parallel computers
title_full High availability for parallel computers
title_fullStr High availability for parallel computers
title_full_unstemmed High availability for parallel computers
title_sort High availability for parallel computers
dc.creator.none.fl_str_mv Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author Rexachs del Rosario, Dolores
author_facet Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author_role author
author2 Luque Fadón, Emilio
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
topic Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
dc.description.none.fl_txt_mv Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant architecture with different protection levels, offering high availability with transparency, decentralization, flexibility and scalability for message-passing systems. Transient faults may cause an application running in a computer system to be removed from execution, however the biggest risk of transient faults is to provoke undetected data corruption that changes the final result of the application without anyone knowing. To evaluate the effects of transient faults in the robustness of applications and validate new fault detection mechanism and strategies, we have developed a full-system simulation fault injection environment
Facultad de Informática
description Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant architecture with different protection levels, offering high availability with transparency, decentralization, flexibility and scalability for message-passing systems. Transient faults may cause an application running in a computer system to be removed from execution, however the biggest risk of transient faults is to provoke undetected data corruption that changes the final result of the application without anyone knowing. To evaluate the effects of transient faults in the robustness of applications and validate new fault detection mechanism and strategies, we have developed a full-system simulation fault injection environment
publishDate 2010
dc.date.none.fl_str_mv 2010-10
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/9677
url http://sedici.unlp.edu.ar/handle/10915/9677
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct10-1.pdf
info:eu-repo/semantics/altIdentifier/issn/1666-6038
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc/3.0/
Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc/3.0/
Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
dc.format.none.fl_str_mv application/pdf
110-116
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615758825390080
score 13.069144