Using combination of actions in reinforcement learning
- Autores
- Karanik, Marcelo J.; Gramajo, Sergio D.
- Año de publicación
- 2010
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.
Facultad de Informática - Materia
-
Ciencias Informáticas
Learning
SARSA
optimal policy
action combination - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/9663
Ver los metadatos del registro completo
id |
SEDICI_666df05b8d1b49f697af09d589b78cf9 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/9663 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Using combination of actions in reinforcement learningKaranik, Marcelo J.Gramajo, Sergio D.Ciencias InformáticasLearningSARSAoptimal policyaction combinationSoftware agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types.Facultad de Informática2010-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf19-23http://sedici.unlp.edu.ar/handle/10915/9663enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-4.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc/3.0/Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T10:43:21Zoai:sedici.unlp.edu.ar:10915/9663Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 10:43:21.827SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Using combination of actions in reinforcement learning |
title |
Using combination of actions in reinforcement learning |
spellingShingle |
Using combination of actions in reinforcement learning Karanik, Marcelo J. Ciencias Informáticas Learning SARSA optimal policy action combination |
title_short |
Using combination of actions in reinforcement learning |
title_full |
Using combination of actions in reinforcement learning |
title_fullStr |
Using combination of actions in reinforcement learning |
title_full_unstemmed |
Using combination of actions in reinforcement learning |
title_sort |
Using combination of actions in reinforcement learning |
dc.creator.none.fl_str_mv |
Karanik, Marcelo J. Gramajo, Sergio D. |
author |
Karanik, Marcelo J. |
author_facet |
Karanik, Marcelo J. Gramajo, Sergio D. |
author_role |
author |
author2 |
Gramajo, Sergio D. |
author2_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Learning SARSA optimal policy action combination |
topic |
Ciencias Informáticas Learning SARSA optimal policy action combination |
dc.description.none.fl_txt_mv |
Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types. Facultad de Informática |
description |
Software agents are programs that can observe their environment and act in an attempt to reach their design goals. In most cases the selection of particular agent architecture determines the behaviour in response to the different problem states However, there are some problem domains in which it is desirable that the agent learns a good action execution policy by interacting with its environment. This kind of learning is called Reinforcement Learning and it is useful in the process control area. Given a problem state, the agent selects the adequate action to do and receives an immediate reward, then estimations about every action are updated and, after a certain period of time, the agent learns which the best action to be executed is. Most reinforcement learning algorithms perform simple actions while two or more are capable of being used. This work involves the use of RL algorithms to find an optimal policy in a gridworld problem and proposes a mechanism to combine actions of different types. |
publishDate |
2010 |
dc.date.none.fl_str_mv |
2010-04 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/9663 |
url |
http://sedici.unlp.edu.ar/handle/10915/9663 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-4.pdf info:eu-repo/semantics/altIdentifier/issn/1666-6038 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc/3.0/ Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc/3.0/ Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) |
dc.format.none.fl_str_mv |
application/pdf 19-23 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1846063848420802560 |
score |
13.22299 |