Image Classification with the Fisher Vector: Theory and Practice

Autores
Sanchez, Jorge Adrian; Perronnin, Florent; Mensink, Thomas; Verbeek, Jakob
Año de publicación
2013
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina
Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia
Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos
Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia
Materia
Image Classification
Large-Scale Classification
Bag-Of-Visual Words
Fisher Vector
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/12271

id CONICETDig_6d83381de7f0e12f4ad144c2f0720c37
oai_identifier_str oai:ri.conicet.gov.ar:11336/12271
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Image Classification with the Fisher Vector: Theory and PracticeSanchez, Jorge AdrianPerronnin, Florent Mensink, ThomasVerbeek, JakobImage ClassificationLarge-Scale ClassificationBag-Of-Visual WordsFisher Vectorhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); ArgentinaFil: Perronnin, Florent . Xerox Research Centre Europe; FranciaFil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países BajosFil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; FranciaSpringer2013-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/12271Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-2450920-5691enginfo:eu-repo/semantics/altIdentifier/url/http://link.springer.com/article/10.1007%2Fs11263-013-0636-xinfo:eu-repo/semantics/altIdentifier/url/http://dx.doi.org/10.1007/s11263-013-0636-xinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:06:13Zoai:ri.conicet.gov.ar:11336/12271instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:06:14.216CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Image Classification with the Fisher Vector: Theory and Practice
title Image Classification with the Fisher Vector: Theory and Practice
spellingShingle Image Classification with the Fisher Vector: Theory and Practice
Sanchez, Jorge Adrian
Image Classification
Large-Scale Classification
Bag-Of-Visual Words
Fisher Vector
title_short Image Classification with the Fisher Vector: Theory and Practice
title_full Image Classification with the Fisher Vector: Theory and Practice
title_fullStr Image Classification with the Fisher Vector: Theory and Practice
title_full_unstemmed Image Classification with the Fisher Vector: Theory and Practice
title_sort Image Classification with the Fisher Vector: Theory and Practice
dc.creator.none.fl_str_mv Sanchez, Jorge Adrian
Perronnin, Florent
Mensink, Thomas
Verbeek, Jakob
author Sanchez, Jorge Adrian
author_facet Sanchez, Jorge Adrian
Perronnin, Florent
Mensink, Thomas
Verbeek, Jakob
author_role author
author2 Perronnin, Florent
Mensink, Thomas
Verbeek, Jakob
author2_role author
author
author
dc.subject.none.fl_str_mv Image Classification
Large-Scale Classification
Bag-Of-Visual Words
Fisher Vector
topic Image Classification
Large-Scale Classification
Bag-Of-Visual Words
Fisher Vector
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina
Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia
Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos
Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia
description A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
publishDate 2013
dc.date.none.fl_str_mv 2013-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/12271
Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-245
0920-5691
url http://hdl.handle.net/11336/12271
identifier_str_mv Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-245
0920-5691
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://link.springer.com/article/10.1007%2Fs11263-013-0636-x
info:eu-repo/semantics/altIdentifier/url/http://dx.doi.org/10.1007/s11263-013-0636-x
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613908555366400
score 13.070432