Flexible Quantization for Efficient Convolutional Neural Networks

Autores
Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel
Año de publicación
2024
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.
Fil: Zacchigna, Federico Giordano. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina
Fil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina
Fil: Lutenberg, Ariel. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia
CNN
quantization
uniform
non-uniform
mixed-precision
FPGA
ASIC
edge devices
embedded systems
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/236859

id CONICETDig_9a64063c49a0356a03f7554b2550ffba
oai_identifier_str oai:ri.conicet.gov.ar:11336/236859
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Flexible Quantization for Efficient Convolutional Neural NetworksZacchigna, Federico GiordanoLew, Sergio EduardoLutenberg, ArielCNNquantizationuniformnon-uniformmixed-precisionFPGAASICedge devicesembedded systemshttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.Fil: Zacchigna, Federico Giordano. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; ArgentinaFil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; ArgentinaFil: Lutenberg, Ariel. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaMDPI2024-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/236859Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-162079-9292CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2079-9292/13/10/1923info:eu-repo/semantics/altIdentifier/doi/10.3390/electronics13101923info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:46:15Zoai:ri.conicet.gov.ar:11336/236859instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:46:15.991CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Flexible Quantization for Efficient Convolutional Neural Networks
title Flexible Quantization for Efficient Convolutional Neural Networks
spellingShingle Flexible Quantization for Efficient Convolutional Neural Networks
Zacchigna, Federico Giordano
CNN
quantization
uniform
non-uniform
mixed-precision
FPGA
ASIC
edge devices
embedded systems
title_short Flexible Quantization for Efficient Convolutional Neural Networks
title_full Flexible Quantization for Efficient Convolutional Neural Networks
title_fullStr Flexible Quantization for Efficient Convolutional Neural Networks
title_full_unstemmed Flexible Quantization for Efficient Convolutional Neural Networks
title_sort Flexible Quantization for Efficient Convolutional Neural Networks
dc.creator.none.fl_str_mv Zacchigna, Federico Giordano
Lew, Sergio Eduardo
Lutenberg, Ariel
author Zacchigna, Federico Giordano
author_facet Zacchigna, Federico Giordano
Lew, Sergio Eduardo
Lutenberg, Ariel
author_role author
author2 Lew, Sergio Eduardo
Lutenberg, Ariel
author2_role author
author
dc.subject.none.fl_str_mv CNN
quantization
uniform
non-uniform
mixed-precision
FPGA
ASIC
edge devices
embedded systems
topic CNN
quantization
uniform
non-uniform
mixed-precision
FPGA
ASIC
edge devices
embedded systems
purl_subject.fl_str_mv https://purl.org/becyt/ford/2.2
https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.
Fil: Zacchigna, Federico Giordano. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina
Fil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina
Fil: Lutenberg, Ariel. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.
publishDate 2024
dc.date.none.fl_str_mv 2024-05
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/236859
Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-16
2079-9292
CONICET Digital
CONICET
url http://hdl.handle.net/11336/236859
identifier_str_mv Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-16
2079-9292
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2079-9292/13/10/1923
info:eu-repo/semantics/altIdentifier/doi/10.3390/electronics13101923
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613445639471104
score 13.070432