BSP cost and scalability analysis for MapReduce operations
- Autores
- Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; Sato, Liria M.; Da Silva, Fabrício A.B.
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.
Fil: Senger, Hermes. Universidade Federal do São Carlos; Brasil
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Arantes, Luciana. Universite Pierre et Marie Curie; Francia
Fil: Marcondes, Cesar A. C.. Universidade Federal do São Carlos; Brasil
Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile
Fil: Sato, Liria M.. Universidade de Sao Paulo; Brasil
Fil: Da Silva, Fabrício A.B.. Fundación Oswaldo Cruz; Brasil - Materia
-
Bsp
Hadoop
Mapreduce
Scalability - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/60660
Ver los metadatos del registro completo
id |
CONICETDig_fc6a7292a58746b6846576bf4e230386 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/60660 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
BSP cost and scalability analysis for MapReduce operationsSenger, HermesGil Costa, Graciela VerónicaArantes, LucianaMarcondes, Cesar A. C.Marin, MauricioSato, Liria M.Da Silva, Fabrício A.B.BspHadoopMapreduceScalabilityhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups.Fil: Senger, Hermes. Universidade Federal do São Carlos; BrasilFil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Arantes, Luciana. Universite Pierre et Marie Curie; FranciaFil: Marcondes, Cesar A. C.. Universidade Federal do São Carlos; BrasilFil: Marin, Mauricio. Universidad de Santiago de Chile; ChileFil: Sato, Liria M.. Universidade de Sao Paulo; BrasilFil: Da Silva, Fabrício A.B.. Fundación Oswaldo Cruz; BrasilJohn Wiley & Sons Ltd2016-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/60660Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; et al.; BSP cost and scalability analysis for MapReduce operations; John Wiley & Sons Ltd; Concurrency and Computation: Practice and Experience; 28; 8; 6-2016; 2503-25271532-0626CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1002/cpe.3628info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3628info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:57:37Zoai:ri.conicet.gov.ar:11336/60660instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:57:37.764CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
BSP cost and scalability analysis for MapReduce operations |
title |
BSP cost and scalability analysis for MapReduce operations |
spellingShingle |
BSP cost and scalability analysis for MapReduce operations Senger, Hermes Bsp Hadoop Mapreduce Scalability |
title_short |
BSP cost and scalability analysis for MapReduce operations |
title_full |
BSP cost and scalability analysis for MapReduce operations |
title_fullStr |
BSP cost and scalability analysis for MapReduce operations |
title_full_unstemmed |
BSP cost and scalability analysis for MapReduce operations |
title_sort |
BSP cost and scalability analysis for MapReduce operations |
dc.creator.none.fl_str_mv |
Senger, Hermes Gil Costa, Graciela Verónica Arantes, Luciana Marcondes, Cesar A. C. Marin, Mauricio Sato, Liria M. Da Silva, Fabrício A.B. |
author |
Senger, Hermes |
author_facet |
Senger, Hermes Gil Costa, Graciela Verónica Arantes, Luciana Marcondes, Cesar A. C. Marin, Mauricio Sato, Liria M. Da Silva, Fabrício A.B. |
author_role |
author |
author2 |
Gil Costa, Graciela Verónica Arantes, Luciana Marcondes, Cesar A. C. Marin, Mauricio Sato, Liria M. Da Silva, Fabrício A.B. |
author2_role |
author author author author author author |
dc.subject.none.fl_str_mv |
Bsp Hadoop Mapreduce Scalability |
topic |
Bsp Hadoop Mapreduce Scalability |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups. Fil: Senger, Hermes. Universidade Federal do São Carlos; Brasil Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Arantes, Luciana. Universite Pierre et Marie Curie; Francia Fil: Marcondes, Cesar A. C.. Universidade Federal do São Carlos; Brasil Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile Fil: Sato, Liria M.. Universidade de Sao Paulo; Brasil Fil: Da Silva, Fabrício A.B.. Fundación Oswaldo Cruz; Brasil |
description |
Data abundance poses the need for powerful and easy-to-use tools that support processing large amounts of data. MapReduce has been increasingly adopted for over a decade by many companies, and more recently, it has attracted the attention of an increasing number of researchers in several areas. One main advantage is that the complex details of parallel processing, such as complex network programming, task scheduling, data placement, and fault tolerance, are hidden in a conceptually simple framework. MapReduce is supported by mature software technologies for deployment in data centers such as Hadoop. As MapReduce becomes popular for high-performance applications, many questions arise concerning its performance and efficiency. In this paper, we demonstrated formally lower bounds on the isoefficiency function for MapReduce applications, when these applications can be modeled as BSP jobs. We also demonstrate how communication and synchronization costs can be dominant for MapReduce computations and discuss the conditions under which such scalability limits are valid. To our knowledge, this is the first study that demonstrates scalability bounds for MapReduce applications. We also discuss how some MapReduce implementations such as Hadoop can mitigate such costs to approach linear, or near-to-linear speedups. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-06 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/60660 Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; et al.; BSP cost and scalability analysis for MapReduce operations; John Wiley & Sons Ltd; Concurrency and Computation: Practice and Experience; 28; 8; 6-2016; 2503-2527 1532-0626 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/60660 |
identifier_str_mv |
Senger, Hermes; Gil Costa, Graciela Verónica; Arantes, Luciana; Marcondes, Cesar A. C.; Marin, Mauricio; et al.; BSP cost and scalability analysis for MapReduce operations; John Wiley & Sons Ltd; Concurrency and Computation: Practice and Experience; 28; 8; 6-2016; 2503-2527 1532-0626 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1002/cpe.3628 info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3628 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
John Wiley & Sons Ltd |
publisher.none.fl_str_mv |
John Wiley & Sons Ltd |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269473222950912 |
score |
13.13397 |