Interpreting Natural Language Instructions Using Language, Vision, and Behavior

Autores
Benotti, Luciana; Lau, Tessa; Villalba, Martin
Año de publicación
2014
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lau, Tessa. Savioke; Estados Unidos
Fil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; Argentina
Materia
Natural Language Interpretation
Multi-Modal Understanding
Action Recognition
Situated Virtual Agent
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/35034

id CONICETDig_a50ade8294c1d710ac1bf161dd633d57
oai_identifier_str oai:ri.conicet.gov.ar:11336/35034
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Interpreting Natural Language Instructions Using Language, Vision, and BehaviorBenotti, LucianaLau, TessaVillalba, MartinNatural Language InterpretationMulti-Modal UnderstandingAction RecognitionSituated Virtual Agenthttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lau, Tessa. Savioke; Estados UnidosFil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; ArgentinaAssociation for Computing Machinery2014-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/zipapplication/pdfhttp://hdl.handle.net/11336/35034Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-20142160-6455CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://dl.acm.org/citation.cfm?id=2629632info:eu-repo/semantics/altIdentifier/doi/10.1145/2629632info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:25:56Zoai:ri.conicet.gov.ar:11336/35034instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:25:56.519CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title Interpreting Natural Language Instructions Using Language, Vision, and Behavior
spellingShingle Interpreting Natural Language Instructions Using Language, Vision, and Behavior
Benotti, Luciana
Natural Language Interpretation
Multi-Modal Understanding
Action Recognition
Situated Virtual Agent
title_short Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_full Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_fullStr Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_full_unstemmed Interpreting Natural Language Instructions Using Language, Vision, and Behavior
title_sort Interpreting Natural Language Instructions Using Language, Vision, and Behavior
dc.creator.none.fl_str_mv Benotti, Luciana
Lau, Tessa
Villalba, Martin
author Benotti, Luciana
author_facet Benotti, Luciana
Lau, Tessa
Villalba, Martin
author_role author
author2 Lau, Tessa
Villalba, Martin
author2_role author
author
dc.subject.none.fl_str_mv Natural Language Interpretation
Multi-Modal Understanding
Action Recognition
Situated Virtual Agent
topic Natural Language Interpretation
Multi-Modal Understanding
Action Recognition
Situated Virtual Agent
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lau, Tessa. Savioke; Estados Unidos
Fil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; Argentina
description We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
publishDate 2014
dc.date.none.fl_str_mv 2014-10
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/35034
Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014
2160-6455
CONICET Digital
CONICET
url http://hdl.handle.net/11336/35034
identifier_str_mv Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014
2160-6455
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://dl.acm.org/citation.cfm?id=2629632
info:eu-repo/semantics/altIdentifier/doi/10.1145/2629632
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/zip
application/pdf
dc.publisher.none.fl_str_mv Association for Computing Machinery
publisher.none.fl_str_mv Association for Computing Machinery
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614258938085376
score 13.070432