Improving interactive reinforcement learning: What makes a good teacher?

Autores: Cruz, Francisco; Magg, Sven; Naga, Yukie; Wermter, Stefan
Año de publicación: 2018
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
interactive reinforcement learning
policy shape
artificial trainer-agent
cleaning scenario
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/70699

Acceder

id	SEDICI_121d887542cebcc5937df1fa24676a29
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/70699
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Improving interactive reinforcement learning: What makes a good teacher?Cruz, FranciscoMagg, SvenNaga, YukieWermter, StefanCiencias Informáticasinteractive reinforcement learningpolicy shapeartificial trainer-agentcleaning scenarioInteractive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.Sociedad Argentina de Informática e Investigación Operativa2018-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionResumenhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/70699enginfo:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-09.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/reference/doi/10.1080/09540091.2018.1443318info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-11-12T10:35:45Zoai:sedici.unlp.edu.ar:10915/70699Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-11-12 10:35:45.564SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Improving interactive reinforcement learning: What makes a good teacher?
title	Improving interactive reinforcement learning: What makes a good teacher?
spellingShingle	Improving interactive reinforcement learning: What makes a good teacher? Cruz, Francisco Ciencias Informáticas interactive reinforcement learning policy shape artificial trainer-agent cleaning scenario
title_short	Improving interactive reinforcement learning: What makes a good teacher?
title_full	Improving interactive reinforcement learning: What makes a good teacher?
title_fullStr	Improving interactive reinforcement learning: What makes a good teacher?
title_full_unstemmed	Improving interactive reinforcement learning: What makes a good teacher?
title_sort	Improving interactive reinforcement learning: What makes a good teacher?
dc.creator.none.fl_str_mv	Cruz, Francisco Magg, Sven Naga, Yukie Wermter, Stefan
author	Cruz, Francisco
author_facet	Cruz, Francisco Magg, Sven Naga, Yukie Wermter, Stefan
author_role	author
author2	Magg, Sven Naga, Yukie Wermter, Stefan
author2_role	author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas interactive reinforcement learning policy shape artificial trainer-agent cleaning scenario
topic	Ciencias Informáticas interactive reinforcement learning policy shape artificial trainer-agent cleaning scenario
dc.description.none.fl_txt_mv	Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters. Sociedad Argentina de Informática e Investigación Operativa
description	Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.
publishDate	2018
dc.date.none.fl_str_mv	2018-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Resumen http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/70699
url	http://sedici.unlp.edu.ar/handle/10915/70699
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-09.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585 info:eu-repo/semantics/reference/doi/10.1080/09540091.2018.1443318
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1848605431106109440
score	13.25334

Improving interactive reinforcement learning: What makes a good teacher?

Publicaciones similares