Improving interactive reinforcement learning: What makes a good teacher?

Autores
Cruz, Francisco; Magg, Sven; Naga, Yukie; Wermter, Stefan
Año de publicación
2018
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
interactive reinforcement learning
policy shape
artificial trainer-agent
cleaning scenario
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/70699

id SEDICI_121d887542cebcc5937df1fa24676a29
oai_identifier_str oai:sedici.unlp.edu.ar:10915/70699
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Improving interactive reinforcement learning: What makes a good teacher?Cruz, FranciscoMagg, SvenNaga, YukieWermter, StefanCiencias Informáticasinteractive reinforcement learningpolicy shapeartificial trainer-agentcleaning scenarioInteractive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.Sociedad Argentina de Informática e Investigación Operativa2018-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionResumenhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/70699enginfo:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-09.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/reference/doi/10.1080/09540091.2018.1443318info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T11:03:20Zoai:sedici.unlp.edu.ar:10915/70699Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 11:03:21.114SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Improving interactive reinforcement learning: What makes a good teacher?
title Improving interactive reinforcement learning: What makes a good teacher?
spellingShingle Improving interactive reinforcement learning: What makes a good teacher?
Cruz, Francisco
Ciencias Informáticas
interactive reinforcement learning
policy shape
artificial trainer-agent
cleaning scenario
title_short Improving interactive reinforcement learning: What makes a good teacher?
title_full Improving interactive reinforcement learning: What makes a good teacher?
title_fullStr Improving interactive reinforcement learning: What makes a good teacher?
title_full_unstemmed Improving interactive reinforcement learning: What makes a good teacher?
title_sort Improving interactive reinforcement learning: What makes a good teacher?
dc.creator.none.fl_str_mv Cruz, Francisco
Magg, Sven
Naga, Yukie
Wermter, Stefan
author Cruz, Francisco
author_facet Cruz, Francisco
Magg, Sven
Naga, Yukie
Wermter, Stefan
author_role author
author2 Magg, Sven
Naga, Yukie
Wermter, Stefan
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
interactive reinforcement learning
policy shape
artificial trainer-agent
cleaning scenario
topic Ciencias Informáticas
interactive reinforcement learning
policy shape
artificial trainer-agent
cleaning scenario
dc.description.none.fl_txt_mv Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
Sociedad Argentina de Informática e Investigación Operativa
description Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
publishDate 2018
dc.date.none.fl_str_mv 2018-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Resumen
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/70699
url http://sedici.unlp.edu.ar/handle/10915/70699
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-09.pdf
info:eu-repo/semantics/altIdentifier/issn/2451-7585
info:eu-repo/semantics/reference/doi/10.1080/09540091.2018.1443318
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1846064086635249664
score 13.22299