Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

Autores
Rucci, Enzo; De Giusti, Armando Eduardo; Naiouf, Marcelo
Año de publicación
2017
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.
XVIII Workshop de Procesamiento Distribuido y Paralelo (WPDP)
Materia
Ingenierías y Tecnologías
Xeon Phi
Knights Landing
Floyd-Warshall
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
CIC Digital (CICBA)
Institución
Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
OAI Identificador
oai:digital.cic.gba.gob.ar:11746/9033

id CICBA_4bea48a1560b11c7c8d74da1c3b42754
oai_identifier_str oai:digital.cic.gba.gob.ar:11746/9033
network_acronym_str CICBA
repository_id_str 9441
network_name_str CIC Digital (CICBA)
spelling Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case StudyRucci, EnzoDe Giusti, Armando EduardoNaiouf, MarceloIngenierías y TecnologíasXeon PhiKnights LandingFloyd-WarshallManycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.XVIII Workshop de Procesamiento Distribuido y Paralelo (WPDP)2017info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/9033enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2025-09-29T13:39:54Zoai:digital.cic.gba.gob.ar:11746/9033Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412025-09-29 13:39:55.097CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse
dc.title.none.fl_str_mv Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
title Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
spellingShingle Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
Rucci, Enzo
Ingenierías y Tecnologías
Xeon Phi
Knights Landing
Floyd-Warshall
title_short Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
title_full Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
title_fullStr Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
title_full_unstemmed Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
title_sort Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study
dc.creator.none.fl_str_mv Rucci, Enzo
De Giusti, Armando Eduardo
Naiouf, Marcelo
author Rucci, Enzo
author_facet Rucci, Enzo
De Giusti, Armando Eduardo
Naiouf, Marcelo
author_role author
author2 De Giusti, Armando Eduardo
Naiouf, Marcelo
author2_role author
author
dc.subject.none.fl_str_mv Ingenierías y Tecnologías
Xeon Phi
Knights Landing
Floyd-Warshall
topic Ingenierías y Tecnologías
Xeon Phi
Knights Landing
Floyd-Warshall
dc.description.none.fl_txt_mv Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.
XVIII Workshop de Procesamiento Distribuido y Paralelo (WPDP)
description Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm as a representative case study of graph and memory-bound ap- plications. Starting from the default serial version, we show how data, thread and compiler level optimizations help the parallel implementation to reach 338 GFLOPS.
publishDate 2017
dc.date.none.fl_str_mv 2017
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv https://digital.cic.gba.gob.ar/handle/11746/9033
url https://digital.cic.gba.gob.ar/handle/11746/9033
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:CIC Digital (CICBA)
instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron:CICBA
reponame_str CIC Digital (CICBA)
collection CIC Digital (CICBA)
instname_str Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron_str CICBA
institution CICBA
repository.name.fl_str_mv CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
repository.mail.fl_str_mv marisa.degiusti@sedici.unlp.edu.ar
_version_ 1844618585490587648
score 13.070432