Interpreting Natural Language Instructions Using Language, Vision, and Behavior
- Autores
- Benotti, Luciana; Lau, Tessa; Villalba, Martin
- Año de publicación
- 2014
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lau, Tessa. Savioke; Estados Unidos
Fil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; Argentina - Materia
-
Natural Language Interpretation
Multi-Modal Understanding
Action Recognition
Situated Virtual Agent - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/35034
Ver los metadatos del registro completo
id |
CONICETDig_a50ade8294c1d710ac1bf161dd633d57 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/35034 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Interpreting Natural Language Instructions Using Language, Vision, and BehaviorBenotti, LucianaLau, TessaVillalba, MartinNatural Language InterpretationMulti-Modal UnderstandingAction RecognitionSituated Virtual Agenthttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lau, Tessa. Savioke; Estados UnidosFil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; ArgentinaAssociation for Computing Machinery2014-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/zipapplication/pdfhttp://hdl.handle.net/11336/35034Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-20142160-6455CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://dl.acm.org/citation.cfm?id=2629632info:eu-repo/semantics/altIdentifier/doi/10.1145/2629632info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:25:56Zoai:ri.conicet.gov.ar:11336/35034instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:25:56.519CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
title |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
spellingShingle |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior Benotti, Luciana Natural Language Interpretation Multi-Modal Understanding Action Recognition Situated Virtual Agent |
title_short |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
title_full |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
title_fullStr |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
title_full_unstemmed |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
title_sort |
Interpreting Natural Language Instructions Using Language, Vision, and Behavior |
dc.creator.none.fl_str_mv |
Benotti, Luciana Lau, Tessa Villalba, Martin |
author |
Benotti, Luciana |
author_facet |
Benotti, Luciana Lau, Tessa Villalba, Martin |
author_role |
author |
author2 |
Lau, Tessa Villalba, Martin |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Natural Language Interpretation Multi-Modal Understanding Action Recognition Situated Virtual Agent |
topic |
Natural Language Interpretation Multi-Modal Understanding Action Recognition Situated Virtual Agent |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus. Fil: Benotti, Luciana. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Lau, Tessa. Savioke; Estados Unidos Fil: Villalba, Martin. Universitat Potsdam; Alemania. Universidad Nacional de Córdoba; Argentina |
description |
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/35034 Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014 2160-6455 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/35034 |
identifier_str_mv |
Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014 2160-6455 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://dl.acm.org/citation.cfm?id=2629632 info:eu-repo/semantics/altIdentifier/doi/10.1145/2629632 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/zip application/pdf |
dc.publisher.none.fl_str_mv |
Association for Computing Machinery |
publisher.none.fl_str_mv |
Association for Computing Machinery |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614258938085376 |
score |
13.070432 |