BSP cost and scalability analysis for MapReduce operations

Autores
Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; Sato, Liria M.; Da Silva, Fabrício A.B.
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.
Fil: Senger, Hermes. Universidade Federal do São Carlos; Brasil
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Arantes, Luciana. Universite Pierre et Marie Curie; Francia
Fil: Marcondes, Cesar A. C.. Universidade Federal do São Carlos; Brasil
Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile
Fil: Sato, Liria M.. Universidade de Sao Paulo; Brasil
Fil: Da Silva, Fabrício A.B.. Fundación Oswaldo Cruz; Brasil
Materia
Bsp
Hadoop
Mapreduce
Scalability
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/60660

id CONICETDig_fc6a7292a58746b6846576bf4e230386
oai_identifier_str oai:ri.conicet.gov.ar:11336/60660
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling BSP cost and scalability analysis for MapReduce operationsSenger, HermesGil Costa, Graciela VerónicaArantes, LucianaMarcondes, Cesar A. C.Marin, MauricioSato, Liria M.Da Silva, Fabrício A.B.BspHadoopMapreduceScalabilityhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.Fil: Senger, Hermes. Universidade Federal do São Carlos; BrasilFil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Arantes, Luciana. Universite Pierre et Marie Curie; FranciaFil: Marcondes, Cesar A. C.. Universidade Federal do São Carlos; BrasilFil: Marin, Mauricio. Universidad de Santiago de Chile; ChileFil: Sato, Liria M.. Universidade de Sao Paulo; BrasilFil: Da Silva, Fabrício A.B.. Fundación Oswaldo Cruz; BrasilJohn Wiley & Sons Ltd2016-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/60660Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; et al.; BSP cost and scalability analysis for MapReduce operations; John Wiley & Sons Ltd; Concurrency and Computation: Practice and Experience; 28; 8; 6-2016; 2503-25271532-0626CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1002/cpe.3628info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3628info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:57:37Zoai:ri.conicet.gov.ar:11336/60660instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:57:37.764CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv BSP cost and scalability analysis for MapReduce operations
title BSP cost and scalability analysis for MapReduce operations
spellingShingle BSP cost and scalability analysis for MapReduce operations
Senger, Hermes
Bsp
Hadoop
Mapreduce
Scalability
title_short BSP cost and scalability analysis for MapReduce operations
title_full BSP cost and scalability analysis for MapReduce operations
title_fullStr BSP cost and scalability analysis for MapReduce operations
title_full_unstemmed BSP cost and scalability analysis for MapReduce operations
title_sort BSP cost and scalability analysis for MapReduce operations
dc.creator.none.fl_str_mv Senger, Hermes
Gil Costa, Graciela Verónica
Arantes, Luciana
Marcondes, Cesar A. C.
Marin, Mauricio
Sato, Liria M.
Da Silva, Fabrício A.B.
author Senger, Hermes
author_facet Senger, Hermes
Gil Costa, Graciela Verónica
Arantes, Luciana
Marcondes, Cesar A. C.
Marin, Mauricio
Sato, Liria M.
Da Silva, Fabrício A.B.
author_role author
author2 Gil Costa, Graciela Verónica
Arantes, Luciana
Marcondes, Cesar A. C.
Marin, Mauricio
Sato, Liria M.
Da Silva, Fabrício A.B.
author2_role author
author
author
author
author
author
dc.subject.none.fl_str_mv Bsp
Hadoop
Mapreduce
Scalability
topic Bsp
Hadoop
Mapreduce
Scalability
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.
Fil: Senger, Hermes. Universidade Federal do São Carlos; Brasil
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Arantes, Luciana. Universite Pierre et Marie Curie; Francia
Fil: Marcondes, Cesar A. C.. Universidade Federal do São Carlos; Brasil
Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile
Fil: Sato, Liria M.. Universidade de Sao Paulo; Brasil
Fil: Da Silva, Fabrício A.B.. Fundación Oswaldo Cruz; Brasil
description Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.
publishDate 2016
dc.date.none.fl_str_mv 2016-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/60660
Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; et al.; BSP cost and scalability analysis for MapReduce operations; John Wiley & Sons Ltd; Concurrency and Computation: Practice and Experience; 28; 8; 6-2016; 2503-2527
1532-0626
CONICET Digital
CONICET
url http://hdl.handle.net/11336/60660
identifier_str_mv Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; et al.; BSP cost and scalability analysis for MapReduce operations; John Wiley & Sons Ltd; Concurrency and Computation: Practice and Experience; 28; 8; 6-2016; 2503-2527
1532-0626
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1002/cpe.3628
info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3628
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv John Wiley & Sons Ltd
publisher.none.fl_str_mv John Wiley & Sons Ltd
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269473222950912
score 13.13397