Data vs. information: using clustering techniques to enhance stock returns forecasting

Autores: Vásquez Sáenz, Javier; Quiroga, Facundo Manuel; Fernández Bariviera, Aurelio
Año de publicación: 2023
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.
Instituto de Investigación en Informática
Materia: Ciencias Informáticas
Stock price forecast
Clustering
Financial Reports
Deep learning
Investment algorithms
Trading
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/160256

Acceder

id	SEDICI_4d0bc25dc58d3c146c3b0a0003f7fb65
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/160256
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Data vs. information: using clustering techniques to enhance stock returns forecastingVásquez Sáenz, JavierQuiroga, Facundo ManuelFernández Bariviera, AurelioCiencias InformáticasStock price forecastClusteringFinancial ReportsDeep learningInvestment algorithmsTradingThis paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.Instituto de Investigación en Informática2023info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/160256enginfo:eu-repo/semantics/altIdentifier/issn/1057-5219info:eu-repo/semantics/altIdentifier/doi/10.1016/j.irfa.2023.102657info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-02-26T11:29:10Zoai:sedici.unlp.edu.ar:10915/160256Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-02-26 11:29:10.677SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Data vs. information: using clustering techniques to enhance stock returns forecasting
title	Data vs. information: using clustering techniques to enhance stock returns forecasting
spellingShingle	Data vs. information: using clustering techniques to enhance stock returns forecasting Vásquez Sáenz, Javier Ciencias Informáticas Stock price forecast Clustering Financial Reports Deep learning Investment algorithms Trading
title_short	Data vs. information: using clustering techniques to enhance stock returns forecasting
title_full	Data vs. information: using clustering techniques to enhance stock returns forecasting
title_fullStr	Data vs. information: using clustering techniques to enhance stock returns forecasting
title_full_unstemmed	Data vs. information: using clustering techniques to enhance stock returns forecasting
title_sort	Data vs. information: using clustering techniques to enhance stock returns forecasting
dc.creator.none.fl_str_mv	Vásquez Sáenz, Javier Quiroga, Facundo Manuel Fernández Bariviera, Aurelio
author	Vásquez Sáenz, Javier
author_facet	Vásquez Sáenz, Javier Quiroga, Facundo Manuel Fernández Bariviera, Aurelio
author_role	author
author2	Quiroga, Facundo Manuel Fernández Bariviera, Aurelio
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Stock price forecast Clustering Financial Reports Deep learning Investment algorithms Trading
topic	Ciencias Informáticas Stock price forecast Clustering Financial Reports Deep learning Investment algorithms Trading
dc.description.none.fl_txt_mv	This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods. Instituto de Investigación en Informática
description	This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.
publishDate	2023
dc.date.none.fl_str_mv	2023
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/160256
url	http://sedici.unlp.edu.ar/handle/10915/160256
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/issn/1057-5219 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.irfa.2023.102657
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1858282391045406720
score	12.665996

Data vs. information: using clustering techniques to enhance stock returns forecasting

Publicaciones similares