Improving interactive reinforcement learning: What makes a good teacher?
- Autores
- Cruz, Francisco; Magg, Sven; Naga, Yukie; Wermter, Stefan
- Año de publicación
- 2018
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
interactive reinforcement learning
policy shape
artificial trainer-agent
cleaning scenario - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-sa/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/70699
Ver los metadatos del registro completo
id |
SEDICI_121d887542cebcc5937df1fa24676a29 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/70699 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Improving interactive reinforcement learning: What makes a good teacher?Cruz, FranciscoMagg, SvenNaga, YukieWermter, StefanCiencias Informáticasinteractive reinforcement learningpolicy shapeartificial trainer-agentcleaning scenarioInteractive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.Sociedad Argentina de Informática e Investigación Operativa2018-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionResumenhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/70699enginfo:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-09.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/reference/doi/10.1080/09540091.2018.1443318info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T11:03:20Zoai:sedici.unlp.edu.ar:10915/70699Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 11:03:21.114SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Improving interactive reinforcement learning: What makes a good teacher? |
title |
Improving interactive reinforcement learning: What makes a good teacher? |
spellingShingle |
Improving interactive reinforcement learning: What makes a good teacher? Cruz, Francisco Ciencias Informáticas interactive reinforcement learning policy shape artificial trainer-agent cleaning scenario |
title_short |
Improving interactive reinforcement learning: What makes a good teacher? |
title_full |
Improving interactive reinforcement learning: What makes a good teacher? |
title_fullStr |
Improving interactive reinforcement learning: What makes a good teacher? |
title_full_unstemmed |
Improving interactive reinforcement learning: What makes a good teacher? |
title_sort |
Improving interactive reinforcement learning: What makes a good teacher? |
dc.creator.none.fl_str_mv |
Cruz, Francisco Magg, Sven Naga, Yukie Wermter, Stefan |
author |
Cruz, Francisco |
author_facet |
Cruz, Francisco Magg, Sven Naga, Yukie Wermter, Stefan |
author_role |
author |
author2 |
Magg, Sven Naga, Yukie Wermter, Stefan |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas interactive reinforcement learning policy shape artificial trainer-agent cleaning scenario |
topic |
Ciencias Informáticas interactive reinforcement learning policy shape artificial trainer-agent cleaning scenario |
dc.description.none.fl_txt_mv |
Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters. Sociedad Argentina de Informática e Investigación Operativa |
description |
Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Resumen http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/70699 |
url |
http://sedici.unlp.edu.ar/handle/10915/70699 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-09.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585 info:eu-repo/semantics/reference/doi/10.1080/09540091.2018.1443318 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1846064086635249664 |
score |
13.22299 |