Using combination of actions in reinforcement learning

Autores
Karanik, Marcelo J.; Gramajo, Sergio D.
Año de publicación
2010
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.
Facultad de Informática
Materia
Ciencias Informáticas
Learning
SARSA
optimal policy
action combination
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/9663

id SEDICI_666df05b8d1b49f697af09d589b78cf9
oai_identifier_str oai:sedici.unlp.edu.ar:10915/9663
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Using combination of actions in reinforcement learningKaranik, Marcelo J.Gramajo, Sergio D.Ciencias InformáticasLearningSARSAoptimal policyaction combinationSoftware agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.Facultad de Informática2010-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf19-23http://sedici.unlp.edu.ar/handle/10915/9663enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-4.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc/3.0/Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T10:43:21Zoai:sedici.unlp.edu.ar:10915/9663Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 10:43:21.827SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Using combination of actions in reinforcement learning
title Using combination of actions in reinforcement learning
spellingShingle Using combination of actions in reinforcement learning
Karanik, Marcelo J.
Ciencias Informáticas
Learning
SARSA
optimal policy
action combination
title_short Using combination of actions in reinforcement learning
title_full Using combination of actions in reinforcement learning
title_fullStr Using combination of actions in reinforcement learning
title_full_unstemmed Using combination of actions in reinforcement learning
title_sort Using combination of actions in reinforcement learning
dc.creator.none.fl_str_mv Karanik, Marcelo J.
Gramajo, Sergio D.
author Karanik, Marcelo J.
author_facet Karanik, Marcelo J.
Gramajo, Sergio D.
author_role author
author2 Gramajo, Sergio D.
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Learning
SARSA
optimal policy
action combination
topic Ciencias Informáticas
Learning
SARSA
optimal policy
action combination
dc.description.none.fl_txt_mv Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.
Facultad de Informática
description Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.
publishDate 2010
dc.date.none.fl_str_mv 2010-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/9663
url http://sedici.unlp.edu.ar/handle/10915/9663
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-4.pdf
info:eu-repo/semantics/altIdentifier/issn/1666-6038
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc/3.0/
Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc/3.0/
Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
dc.format.none.fl_str_mv application/pdf
19-23
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1846063848420802560
score 13.22299