Image Classification with the Fisher Vector: Theory and Practice
- Autores
- Sanchez, Jorge Adrian; Perronnin, Florent; Mensink, Thomas; Verbeek, Jakob
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina
Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia
Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos
Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia - Materia
-
Image Classification
Large-Scale Classification
Bag-Of-Visual Words
Fisher Vector - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/12271
Ver los metadatos del registro completo
id |
CONICETDig_6d83381de7f0e12f4ad144c2f0720c37 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/12271 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Image Classification with the Fisher Vector: Theory and PracticeSanchez, Jorge AdrianPerronnin, Florent Mensink, ThomasVerbeek, JakobImage ClassificationLarge-Scale ClassificationBag-Of-Visual WordsFisher Vectorhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); ArgentinaFil: Perronnin, Florent . Xerox Research Centre Europe; FranciaFil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países BajosFil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; FranciaSpringer2013-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/12271Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-2450920-5691enginfo:eu-repo/semantics/altIdentifier/url/http://link.springer.com/article/10.1007%2Fs11263-013-0636-xinfo:eu-repo/semantics/altIdentifier/url/http://dx.doi.org/10.1007/s11263-013-0636-xinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:06:13Zoai:ri.conicet.gov.ar:11336/12271instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:06:14.216CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Image Classification with the Fisher Vector: Theory and Practice |
title |
Image Classification with the Fisher Vector: Theory and Practice |
spellingShingle |
Image Classification with the Fisher Vector: Theory and Practice Sanchez, Jorge Adrian Image Classification Large-Scale Classification Bag-Of-Visual Words Fisher Vector |
title_short |
Image Classification with the Fisher Vector: Theory and Practice |
title_full |
Image Classification with the Fisher Vector: Theory and Practice |
title_fullStr |
Image Classification with the Fisher Vector: Theory and Practice |
title_full_unstemmed |
Image Classification with the Fisher Vector: Theory and Practice |
title_sort |
Image Classification with the Fisher Vector: Theory and Practice |
dc.creator.none.fl_str_mv |
Sanchez, Jorge Adrian Perronnin, Florent Mensink, Thomas Verbeek, Jakob |
author |
Sanchez, Jorge Adrian |
author_facet |
Sanchez, Jorge Adrian Perronnin, Florent Mensink, Thomas Verbeek, Jakob |
author_role |
author |
author2 |
Perronnin, Florent Mensink, Thomas Verbeek, Jakob |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Image Classification Large-Scale Classification Bag-Of-Visual Words Fisher Vector |
topic |
Image Classification Large-Scale Classification Bag-Of-Visual Words Fisher Vector |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique. Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia |
description |
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-06 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/12271 Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-245 0920-5691 |
url |
http://hdl.handle.net/11336/12271 |
identifier_str_mv |
Sanchez, Jorge Adrian; Perronnin, Florent ; Mensink, Thomas; Verbeek, Jakob; Image Classification with the Fisher Vector: Theory and Practice; Springer; International Journal Of Computer Vision; 105; 3; 6-2013; 222-245 0920-5691 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://link.springer.com/article/10.1007%2Fs11263-013-0636-x info:eu-repo/semantics/altIdentifier/url/http://dx.doi.org/10.1007/s11263-013-0636-x |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Springer |
publisher.none.fl_str_mv |
Springer |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613908555366400 |
score |
13.070432 |