Image Classification with the Fisher Vector: Theory and Practice

Autores: Sanchez, Jorge Adrian; Perronnin, Florent; Mensink, Thomas; Verbeek, Jakob
Año de publicación: 2013
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina
Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia
Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos
Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia
Materia: Image Classification
Large-Scale Classification
Bag-Of-Visual Words
Fisher Vector
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/12271

Acceder

id	CONICETDig_6d83381de7f0e12f4ad144c2f0720c37
oai_identifier_str	oai:ri.conicet.gov.ar:11336/12271
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Image Classification with the Fisher Vector: Theory and PracticeSanchez, Jorge AdrianPerronnin, Florent Mensink, ThomasVerbeek, JakobImage ClassificationLarge-Scale ClassificationBag-Of-Visual WordsFisher Vectorhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); ArgentinaFil: Perronnin, Florent . Xerox Research Centre Europe; FranciaFil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países BajosFil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; FranciaSpringer2013-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/12271Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-2450920-5691enginfo:eu-repo/semantics/altIdentifier/url/http://link.springer.com/article/10.1007%2Fs11263-013-0636-xinfo:eu-repo/semantics/altIdentifier/url/http://dx.doi.org/10.1007/s11263-013-0636-xinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-11-26T08:54:52Zoai:ri.conicet.gov.ar:11336/12271instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-11-26 08:54:52.423CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Image Classification with the Fisher Vector: Theory and Practice
title	Image Classification with the Fisher Vector: Theory and Practice
spellingShingle	Image Classification with the Fisher Vector: Theory and Practice Sanchez, Jorge Adrian Image Classification Large-Scale Classification Bag-Of-Visual Words Fisher Vector
title_short	Image Classification with the Fisher Vector: Theory and Practice
title_full	Image Classification with the Fisher Vector: Theory and Practice
title_fullStr	Image Classification with the Fisher Vector: Theory and Practice
title_full_unstemmed	Image Classification with the Fisher Vector: Theory and Practice
title_sort	Image Classification with the Fisher Vector: Theory and Practice
dc.creator.none.fl_str_mv	Sanchez, Jorge Adrian Perronnin, Florent Mensink, Thomas Verbeek, Jakob
author	Sanchez, Jorge Adrian
author_facet	Sanchez, Jorge Adrian Perronnin, Florent Mensink, Thomas Verbeek, Jakob
author_role	author
author2	Perronnin, Florent Mensink, Thomas Verbeek, Jakob
author2_role	author author author
dc.subject.none.fl_str_mv	Image Classification Large-Scale Classification Bag-Of-Visual Words Fisher Vector
topic	Image Classification Large-Scale Classification Bag-Of-Visual Words Fisher Vector
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique. Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia
description	A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
publishDate	2013
dc.date.none.fl_str_mv	2013-06
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/12271 Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-245 0920-5691
url	http://hdl.handle.net/11336/12271
identifier_str_mv	Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-245 0920-5691
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://link.springer.com/article/10.1007%2Fs11263-013-0636-x info:eu-repo/semantics/altIdentifier/url/http://dx.doi.org/10.1007/s11263-013-0636-x
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Springer
publisher.none.fl_str_mv	Springer
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1849873063692206080
score	13.011256

Image Classification with the Fisher Vector: Theory and Practice

Publicaciones similares