Interpreting Natural Language Instructions Using Language, Vision, and Behavior

Autores: Benotti, Luciana; Lau, Tessa; Villalba, Martin
Año de publicación: 2014
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lau, Tessa. Savioke; Estados Unidos
Fil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; Argentina
Materia: Natural Language Interpretation
Multi-Modal Understanding
Action Recognition
Situated Virtual Agent
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/35034

Acceder

id	CONICETDig_a50ade8294c1d710ac1bf161dd633d57
oai_identifier_str	oai:ri.conicet.gov.ar:11336/35034
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Interpreting Natural Language Instructions Using Language, Vision, and BehaviorBenotti, LucianaLau, TessaVillalba, MartinNatural Language InterpretationMulti-Modal UnderstandingAction RecognitionSituated Virtual Agenthttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lau, Tessa. Savioke; Estados UnidosFil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; ArgentinaAssociation for Computing Machinery2014-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/zipapplication/pdfhttp://hdl.handle.net/11336/35034Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-20142160-6455CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://dl.acm.org/citation.cfm?id=2629632info:eu-repo/semantics/altIdentifier/doi/10.1145/2629632info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-12-03T09:32:45Zoai:ri.conicet.gov.ar:11336/35034instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-12-03 09:32:46.096CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
spellingShingle	Interpreting Natural Language Instructions Using Language, Vision, and Behavior Benotti, Luciana Natural Language Interpretation Multi-Modal Understanding Action Recognition Situated Virtual Agent
title_short	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_full	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_fullStr	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_full_unstemmed	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_sort	Interpreting Natural Language Instructions Using Language, Vision, and Behavior
dc.creator.none.fl_str_mv	Benotti, Luciana Lau, Tessa Villalba, Martin
author	Benotti, Luciana
author_facet	Benotti, Luciana Lau, Tessa Villalba, Martin
author_role	author
author2	Lau, Tessa Villalba, Martin
author2_role	author author
dc.subject.none.fl_str_mv	Natural Language Interpretation Multi-Modal Understanding Action Recognition Situated Virtual Agent
topic	Natural Language Interpretation Multi-Modal Understanding Action Recognition Situated Virtual Agent
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus. Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Lau, Tessa. Savioke; Estados Unidos Fil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; Argentina
description	We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
publishDate	2014
dc.date.none.fl_str_mv	2014-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/35034 Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014 2160-6455 CONICET Digital CONICET
url	http://hdl.handle.net/11336/35034
identifier_str_mv	Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014 2160-6455 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://dl.acm.org/citation.cfm?id=2629632 info:eu-repo/semantics/altIdentifier/doi/10.1145/2629632
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/zip application/pdf
dc.publisher.none.fl_str_mv	Association for Computing Machinery
publisher.none.fl_str_mv	Association for Computing Machinery
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1850505713881710592
score	13.214268

Interpreting Natural Language Instructions Using Language, Vision, and Behavior

Publicaciones similares