Data vs. information: using clustering techniques to enhance stock returns forecasting

Autores
Vásquez Sáenz, Javier; Quiroga, Facundo Manuel; Fernández Bariviera, Aurelio
Año de publicación
2023
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.
Instituto de Investigación en Informática
Materia
Ciencias Informáticas
Stock price forecast
Clustering
Financial Reports
Deep learning
Investment algorithms
Trading
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/160256

id SEDICI_4d0bc25dc58d3c146c3b0a0003f7fb65
oai_identifier_str oai:sedici.unlp.edu.ar:10915/160256
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Data vs. information: using clustering techniques to enhance stock returns forecastingVásquez Sáenz, JavierQuiroga, Facundo ManuelFernández Bariviera, AurelioCiencias InformáticasStock price forecastClusteringFinancial ReportsDeep learningInvestment algorithmsTradingThis paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.Instituto de Investigación en Informática2023info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/160256enginfo:eu-repo/semantics/altIdentifier/issn/1057-5219info:eu-repo/semantics/altIdentifier/doi/10.1016/j.irfa.2023.102657info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-11-26T10:19:53Zoai:sedici.unlp.edu.ar:10915/160256Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-11-26 10:19:53.881SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Data vs. information: using clustering techniques to enhance stock returns forecasting
title Data vs. information: using clustering techniques to enhance stock returns forecasting
spellingShingle Data vs. information: using clustering techniques to enhance stock returns forecasting
Vásquez Sáenz, Javier
Ciencias Informáticas
Stock price forecast
Clustering
Financial Reports
Deep learning
Investment algorithms
Trading
title_short Data vs. information: using clustering techniques to enhance stock returns forecasting
title_full Data vs. information: using clustering techniques to enhance stock returns forecasting
title_fullStr Data vs. information: using clustering techniques to enhance stock returns forecasting
title_full_unstemmed Data vs. information: using clustering techniques to enhance stock returns forecasting
title_sort Data vs. information: using clustering techniques to enhance stock returns forecasting
dc.creator.none.fl_str_mv Vásquez Sáenz, Javier
Quiroga, Facundo Manuel
Fernández Bariviera, Aurelio
author Vásquez Sáenz, Javier
author_facet Vásquez Sáenz, Javier
Quiroga, Facundo Manuel
Fernández Bariviera, Aurelio
author_role author
author2 Quiroga, Facundo Manuel
Fernández Bariviera, Aurelio
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Stock price forecast
Clustering
Financial Reports
Deep learning
Investment algorithms
Trading
topic Ciencias Informáticas
Stock price forecast
Clustering
Financial Reports
Deep learning
Investment algorithms
Trading
dc.description.none.fl_txt_mv This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.
Instituto de Investigación en Informática
description This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.
publishDate 2023
dc.date.none.fl_str_mv 2023
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/160256
url http://sedici.unlp.edu.ar/handle/10915/160256
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/1057-5219
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.irfa.2023.102657
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1849876257097908224
score 13.011256