Human Action Recognition in Videos using a Robust CNN LSTM Approach

Autores: Orozco, Carlos Ismael; Xamena, Eduardo; Buemi, María Elena; Berlles, Julio Jacobo
Año de publicación: 2020
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose (1) The implementation of a CNN–LSTM architecture. First, a pre-trained VGG16 convolutional neural network extracts the features of the input video. Then, an LSTM classifies the video sequence in a particular class. (2) A study of how the number of LSTM units affects the performance of the system. To carry out the training and test phases, we used the KTH, UCF-11 and HMDB-51 datasets. (3) An evaluation of the performance of our system using accuracy as evaluation metric, given the existing balance of the classes in the datasets. We obtain 93%, 91% and 47% accuracy respectively for each dataset, improving state of the art results for the former two. Besides the results attained, the main contribution of this work lays on the evaluation of different CNN-LSTM architectures for the action recognition task.
El reconocimiento de acciones en videos es actualmente un tema de interés en el área de visión por computadora, debido a potenciales aplicaciones como: indexación multimedia, vigilancia en espacios públicos, entre otras. En este artículo proponemos: (1) Implementar una arquitectura CNN–LSTM para esta tarea. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las características del video de entrada. Luego, una capa LSTM determina la clase particular del video. (2) Estudiar cómo la cantidad de unidades LSTM afecta el rendimiento del sistema. Para llevar a cabo las fases de entrenamiento y prueba, utilizamos los conjuntos de datos KTH, UCF-11 y HMDB-51. (3) Evaluar el rendimiento de nuestro sistema utilizando la precisión como métrica de evaluación, dado el balance existente entre las clases de los conjuntos de datos. Obtenemos un 93%, 91% y 47% de precisión respectivamente para cada conjunto de datos, mejorando los resultados del estado del arte para los primeros dos. Además de los resultados obtenidos, la principal contribución de este trabajo yace en la evaluación de diferentes arquitecturas CNN-LSTM para la tarea de reconocimiento de acciones
Fil: Orozco, Carlos Ismael. Universidad Nacional de Salta. Facultad de Ciencias Exactas. Departamento de Informática; Argentina
Fil: Xamena, Eduardo. Universidad Nacional de Salta. Facultad de Ciencias Exactas. Departamento de Informática; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Buemi, María Elena. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Computacion; Argentina
Fil: Berlles, Julio Jacobo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Materia: RECONOCIMIENTO DE ACCIONES
REDES NEURONALES CONVOLUCIONALES
REDES NEURONALES DE CORTA Y LARGA MEMORIA
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/141949

Acceder

id	CONICETDig_7c58adfa9a54166749d4b0641fbc986f
oai_identifier_str	oai:ri.conicet.gov.ar:11336/141949
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Human Action Recognition in Videos using a Robust CNN LSTM ApproachReconocimiento de Acciones Humanas en Videos usando una Red Neuronal CNN LSTM RobustaOrozco, Carlos IsmaelXamena, EduardoBuemi, María ElenaBerlles, Julio JacoboRECONOCIMIENTO DE ACCIONESREDES NEURONALES CONVOLUCIONALESREDES NEURONALES DE CORTA Y LARGA MEMORIAhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose (1) The implementation of a CNN–LSTM architecture. First, a pre-trained VGG16 convolutional neural network extracts the features of the input video. Then, an LSTM classifies the video sequence in a particular class. (2) A study of how the number of LSTM units affects the performance of the system. To carry out the training and test phases, we used the KTH, UCF-11 and HMDB-51 datasets. (3) An evaluation of the performance of our system using accuracy as evaluation metric, given the existing balance of the classes in the datasets. We obtain 93%, 91% and 47% accuracy respectively for each dataset, improving state of the art results for the former two. Besides the results attained, the main contribution of this work lays on the evaluation of different CNN-LSTM architectures for the action recognition task.El reconocimiento de acciones en videos es actualmente un tema de interés en el área de visión por computadora, debido a potenciales aplicaciones como: indexación multimedia, vigilancia en espacios públicos, entre otras. En este artículo proponemos: (1) Implementar una arquitectura CNN–LSTM para esta tarea. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las características del video de entrada. Luego, una capa LSTM determina la clase particular del video. (2) Estudiar cómo la cantidad de unidades LSTM afecta el rendimiento del sistema. Para llevar a cabo las fases de entrenamiento y prueba, utilizamos los conjuntos de datos KTH, UCF-11 y HMDB-51. (3) Evaluar el rendimiento de nuestro sistema utilizando la precisión como métrica de evaluación, dado el balance existente entre las clases de los conjuntos de datos. Obtenemos un 93%, 91% y 47% de precisión respectivamente para cada conjunto de datos, mejorando los resultados del estado del arte para los primeros dos. Además de los resultados obtenidos, la principal contribución de este trabajo yace en la evaluación de diferentes arquitecturas CNN-LSTM para la tarea de reconocimiento de accionesFil: Orozco, Carlos Ismael. Universidad Nacional de Salta. Facultad de Ciencias Exactas. Departamento de Informática; ArgentinaFil: Xamena, Eduardo. Universidad Nacional de Salta. Facultad de Ciencias Exactas. Departamento de Informática; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Buemi, María Elena. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Computacion; ArgentinaFil: Berlles, Julio Jacobo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaUniversidad de Palermo. Facultad de Ingeniería2020-12-30info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/141949Orozco, Carlos Ismael; Xamena, Eduardo; Buemi, María Elena; Berlles, Julio Jacobo; Human Action Recognition in Videos using a Robust CNN LSTM Approach; Universidad de Palermo. Facultad de Ingeniería; Ciencia y Tecnología; 2020; 20; 30-12-2020; 23-361850-08702344-9217CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://dspace.palermo.edu/ojs/index.php/cyt/article/view/3288info:eu-repo/semantics/altIdentifier/doi/10.18682/cyt.vi0.3288info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-11-05T09:36:35Zoai:ri.conicet.gov.ar:11336/141949instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-11-05 09:36:35.435CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Human Action Recognition in Videos using a Robust CNN LSTM Approach Reconocimiento de Acciones Humanas en Videos usando una Red Neuronal CNN LSTM Robusta
title	Human Action Recognition in Videos using a Robust CNN LSTM Approach
spellingShingle	Human Action Recognition in Videos using a Robust CNN LSTM Approach Orozco, Carlos Ismael RECONOCIMIENTO DE ACCIONES REDES NEURONALES CONVOLUCIONALES REDES NEURONALES DE CORTA Y LARGA MEMORIA
title_short	Human Action Recognition in Videos using a Robust CNN LSTM Approach
title_full	Human Action Recognition in Videos using a Robust CNN LSTM Approach
title_fullStr	Human Action Recognition in Videos using a Robust CNN LSTM Approach
title_full_unstemmed	Human Action Recognition in Videos using a Robust CNN LSTM Approach
title_sort	Human Action Recognition in Videos using a Robust CNN LSTM Approach
dc.creator.none.fl_str_mv	Orozco, Carlos Ismael Xamena, Eduardo Buemi, María Elena Berlles, Julio Jacobo
author	Orozco, Carlos Ismael
author_facet	Orozco, Carlos Ismael Xamena, Eduardo Buemi, María Elena Berlles, Julio Jacobo
author_role	author
author2	Xamena, Eduardo Buemi, María Elena Berlles, Julio Jacobo
author2_role	author author author
dc.subject.none.fl_str_mv	RECONOCIMIENTO DE ACCIONES REDES NEURONALES CONVOLUCIONALES REDES NEURONALES DE CORTA Y LARGA MEMORIA
topic	RECONOCIMIENTO DE ACCIONES REDES NEURONALES CONVOLUCIONALES REDES NEURONALES DE CORTA Y LARGA MEMORIA
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose (1) The implementation of a CNN–LSTM architecture. First, a pre-trained VGG16 convolutional neural network extracts the features of the input video. Then, an LSTM classifies the video sequence in a particular class. (2) A study of how the number of LSTM units affects the performance of the system. To carry out the training and test phases, we used the KTH, UCF-11 and HMDB-51 datasets. (3) An evaluation of the performance of our system using accuracy as evaluation metric, given the existing balance of the classes in the datasets. We obtain 93%, 91% and 47% accuracy respectively for each dataset, improving state of the art results for the former two. Besides the results attained, the main contribution of this work lays on the evaluation of different CNN-LSTM architectures for the action recognition task. El reconocimiento de acciones en videos es actualmente un tema de interés en el área de visión por computadora, debido a potenciales aplicaciones como: indexación multimedia, vigilancia en espacios públicos, entre otras. En este artículo proponemos: (1) Implementar una arquitectura CNN–LSTM para esta tarea. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las características del video de entrada. Luego, una capa LSTM determina la clase particular del video. (2) Estudiar cómo la cantidad de unidades LSTM afecta el rendimiento del sistema. Para llevar a cabo las fases de entrenamiento y prueba, utilizamos los conjuntos de datos KTH, UCF-11 y HMDB-51. (3) Evaluar el rendimiento de nuestro sistema utilizando la precisión como métrica de evaluación, dado el balance existente entre las clases de los conjuntos de datos. Obtenemos un 93%, 91% y 47% de precisión respectivamente para cada conjunto de datos, mejorando los resultados del estado del arte para los primeros dos. Además de los resultados obtenidos, la principal contribución de este trabajo yace en la evaluación de diferentes arquitecturas CNN-LSTM para la tarea de reconocimiento de acciones Fil: Orozco, Carlos Ismael. Universidad Nacional de Salta. Facultad de Ciencias Exactas. Departamento de Informática; Argentina Fil: Xamena, Eduardo. Universidad Nacional de Salta. Facultad de Ciencias Exactas. Departamento de Informática; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Buemi, María Elena. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Computacion; Argentina Fil: Berlles, Julio Jacobo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
description	Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose (1) The implementation of a CNN–LSTM architecture. First, a pre-trained VGG16 convolutional neural network extracts the features of the input video. Then, an LSTM classifies the video sequence in a particular class. (2) A study of how the number of LSTM units affects the performance of the system. To carry out the training and test phases, we used the KTH, UCF-11 and HMDB-51 datasets. (3) An evaluation of the performance of our system using accuracy as evaluation metric, given the existing balance of the classes in the datasets. We obtain 93%, 91% and 47% accuracy respectively for each dataset, improving state of the art results for the former two. Besides the results attained, the main contribution of this work lays on the evaluation of different CNN-LSTM architectures for the action recognition task.
publishDate	2020
dc.date.none.fl_str_mv	2020-12-30
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/141949 Orozco, Carlos Ismael; Xamena, Eduardo; Buemi, María Elena; Berlles, Julio Jacobo; Human Action Recognition in Videos using a Robust CNN LSTM Approach; Universidad de Palermo. Facultad de Ingeniería; Ciencia y Tecnología; 2020; 20; 30-12-2020; 23-36 1850-0870 2344-9217 CONICET Digital CONICET
url	http://hdl.handle.net/11336/141949
identifier_str_mv	Orozco, Carlos Ismael; Xamena, Eduardo; Buemi, María Elena; Berlles, Julio Jacobo; Human Action Recognition in Videos using a Robust CNN LSTM Approach; Universidad de Palermo. Facultad de Ingeniería; Ciencia y Tecnología; 2020; 20; 30-12-2020; 23-36 1850-0870 2344-9217 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://dspace.palermo.edu/ojs/index.php/cyt/article/view/3288 info:eu-repo/semantics/altIdentifier/doi/10.18682/cyt.vi0.3288
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Universidad de Palermo. Facultad de Ingeniería
publisher.none.fl_str_mv	Universidad de Palermo. Facultad de Ingeniería
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1847976819589906432
score	13.087074

Human Action Recognition in Videos using a Robust CNN LSTM Approach

Publicaciones similares