Flexible Quantization for Efficient Convolutional Neural Networks

Autores: Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel
Año de publicación: 2024
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.
Fil: Zacchigna, Federico Giordano. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina
Fil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina
Fil: Lutenberg, Ariel. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia: CNN
quantization
uniform
non-uniform
mixed-precision
FPGA
ASIC
edge devices
embedded systems
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/236859

Acceder

id	CONICETDig_9a64063c49a0356a03f7554b2550ffba
oai_identifier_str	oai:ri.conicet.gov.ar:11336/236859
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Flexible Quantization for Efficient Convolutional Neural NetworksZacchigna, Federico GiordanoLew, Sergio EduardoLutenberg, ArielCNNquantizationuniformnon-uniformmixed-precisionFPGAASICedge devicesembedded systemshttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.Fil: Zacchigna, Federico Giordano. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; ArgentinaFil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; ArgentinaFil: Lutenberg, Ariel. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaMDPI2024-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/236859Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-162079-9292CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2079-9292/13/10/1923info:eu-repo/semantics/altIdentifier/doi/10.3390/electronics13101923info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-12-03T08:46:54Zoai:ri.conicet.gov.ar:11336/236859instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-12-03 08:46:54.307CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Flexible Quantization for Efficient Convolutional Neural Networks
title	Flexible Quantization for Efficient Convolutional Neural Networks
spellingShingle	Flexible Quantization for Efficient Convolutional Neural Networks Zacchigna, Federico Giordano CNN quantization uniform non-uniform mixed-precision FPGA ASIC edge devices embedded systems
title_short	Flexible Quantization for Efficient Convolutional Neural Networks
title_full	Flexible Quantization for Efficient Convolutional Neural Networks
title_fullStr	Flexible Quantization for Efficient Convolutional Neural Networks
title_full_unstemmed	Flexible Quantization for Efficient Convolutional Neural Networks
title_sort	Flexible Quantization for Efficient Convolutional Neural Networks
dc.creator.none.fl_str_mv	Zacchigna, Federico Giordano Lew, Sergio Eduardo Lutenberg, Ariel
author	Zacchigna, Federico Giordano
author_facet	Zacchigna, Federico Giordano Lew, Sergio Eduardo Lutenberg, Ariel
author_role	author
author2	Lew, Sergio Eduardo Lutenberg, Ariel
author2_role	author author
dc.subject.none.fl_str_mv	CNN quantization uniform non-uniform mixed-precision FPGA ASIC edge devices embedded systems
topic	CNN quantization uniform non-uniform mixed-precision FPGA ASIC edge devices embedded systems
purl_subject.fl_str_mv	https://purl.org/becyt/ford/2.2 https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv	This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%. Fil: Zacchigna, Federico Giordano. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina Fil: Lew, Sergio Eduardo. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina Fil: Lutenberg, Ariel. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Electronica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description	This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to ∼1.58 bits, but with a loss in performance of only ∼0.6%.
publishDate	2024
dc.date.none.fl_str_mv	2024-05
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/236859 Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-16 2079-9292 CONICET Digital CONICET
url	http://hdl.handle.net/11336/236859
identifier_str_mv	Zacchigna, Federico Giordano; Lew, Sergio Eduardo; Lutenberg, Ariel; Flexible Quantization for Efficient Convolutional Neural Networks; MDPI; Electronics; 13; 10; 5-2024; 1-16 2079-9292 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2079-9292/13/10/1923 info:eu-repo/semantics/altIdentifier/doi/10.3390/electronics13101923
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	MDPI
publisher.none.fl_str_mv	MDPI
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1850504851794952192
score	13.214268

Flexible Quantization for Efficient Convolutional Neural Networks

Publicaciones similares