Lessons learned from contrasting a BLAS kernel implementations
- Autores
- More, Andres
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.
WPDP- XIII Workshop procesamiento distribuido y paralelo
Red de Universidades con Carreras en Informática (RedUNCI) - Materia
-
Ciencias Informáticas
BLAS libraries
SSPR kernel
CPU architecture
GPU architecture
performance analysis
performance measurement
software optimization
Software libraries
Optimization
PROCESSOR ARCHITECTURES
Performance Analysis and Design Aids - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/31702
Ver los metadatos del registro completo
id |
SEDICI_aa979b344c7a824dc6b8f8d229f6f8c4 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/31702 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Lessons learned from contrasting a BLAS kernel implementationsMore, AndresCiencias InformáticasBLAS librariesSSPR kernelCPU architectureGPU architectureperformance analysisperformance measurementsoftware optimizationSoftware librariesOptimizationPROCESSOR ARCHITECTURESPerformance Analysis and Design AidsThis work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved.WPDP- XIII Workshop procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI)2013-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/31702enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:30:47Zoai:sedici.unlp.edu.ar:10915/31702Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:30:49.085SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Lessons learned from contrasting a BLAS kernel implementations |
title |
Lessons learned from contrasting a BLAS kernel implementations |
spellingShingle |
Lessons learned from contrasting a BLAS kernel implementations More, Andres Ciencias Informáticas BLAS libraries SSPR kernel CPU architecture GPU architecture performance analysis performance measurement software optimization Software libraries Optimization PROCESSOR ARCHITECTURES Performance Analysis and Design Aids |
title_short |
Lessons learned from contrasting a BLAS kernel implementations |
title_full |
Lessons learned from contrasting a BLAS kernel implementations |
title_fullStr |
Lessons learned from contrasting a BLAS kernel implementations |
title_full_unstemmed |
Lessons learned from contrasting a BLAS kernel implementations |
title_sort |
Lessons learned from contrasting a BLAS kernel implementations |
dc.creator.none.fl_str_mv |
More, Andres |
author |
More, Andres |
author_facet |
More, Andres |
author_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas BLAS libraries SSPR kernel CPU architecture GPU architecture performance analysis performance measurement software optimization Software libraries Optimization PROCESSOR ARCHITECTURES Performance Analysis and Design Aids |
topic |
Ciencias Informáticas BLAS libraries SSPR kernel CPU architecture GPU architecture performance analysis performance measurement software optimization Software libraries Optimization PROCESSOR ARCHITECTURES Performance Analysis and Design Aids |
dc.description.none.fl_txt_mv |
This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved. WPDP- XIII Workshop procesamiento distribuido y paralelo Red de Universidades con Carreras en Informática (RedUNCI) |
description |
This work reviews the experience of implementing different versions of the SSPR rank-one update operation of the BLAS library. The main objective was to contrast CPU versus GPU implementation effort and complexity of an optimized BLAS routine, not considering performance. This work contributes with a sample procedure to compare BLAS kernel implementations, how to start using GPU libraries and offloading, how to analyze their performance and the issues faced and how they were solved. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/31702 |
url |
http://sedici.unlp.edu.ar/handle/10915/31702 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5) |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842260151792304128 |
score |
13.13397 |