Data vs. information: using clustering techniques to enhance stock returns forecasting
- Autores
- Vásquez Sáenz, Javier; Quiroga, Facundo Manuel; Fernández Bariviera, Aurelio
- Año de publicación
- 2023
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.
Instituto de Investigación en Informática - Materia
-
Ciencias Informáticas
Stock price forecast
Clustering
Financial Reports
Deep learning
Investment algorithms
Trading - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/4.0/
- Repositorio
.jpg)
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/160256
Ver los metadatos del registro completo
| id |
SEDICI_4d0bc25dc58d3c146c3b0a0003f7fb65 |
|---|---|
| oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/160256 |
| network_acronym_str |
SEDICI |
| repository_id_str |
1329 |
| network_name_str |
SEDICI (UNLP) |
| spelling |
Data vs. information: using clustering techniques to enhance stock returns forecastingVásquez Sáenz, JavierQuiroga, Facundo ManuelFernández Bariviera, AurelioCiencias InformáticasStock price forecastClusteringFinancial ReportsDeep learningInvestment algorithmsTradingThis paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods.Instituto de Investigación en Informática2023info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/160256enginfo:eu-repo/semantics/altIdentifier/issn/1057-5219info:eu-repo/semantics/altIdentifier/doi/10.1016/j.irfa.2023.102657info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-11-26T10:19:53Zoai:sedici.unlp.edu.ar:10915/160256Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-11-26 10:19:53.881SEDICI (UNLP) - Universidad Nacional de La Platafalse |
| dc.title.none.fl_str_mv |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| title |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| spellingShingle |
Data vs. information: using clustering techniques to enhance stock returns forecasting Vásquez Sáenz, Javier Ciencias Informáticas Stock price forecast Clustering Financial Reports Deep learning Investment algorithms Trading |
| title_short |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| title_full |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| title_fullStr |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| title_full_unstemmed |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| title_sort |
Data vs. information: using clustering techniques to enhance stock returns forecasting |
| dc.creator.none.fl_str_mv |
Vásquez Sáenz, Javier Quiroga, Facundo Manuel Fernández Bariviera, Aurelio |
| author |
Vásquez Sáenz, Javier |
| author_facet |
Vásquez Sáenz, Javier Quiroga, Facundo Manuel Fernández Bariviera, Aurelio |
| author_role |
author |
| author2 |
Quiroga, Facundo Manuel Fernández Bariviera, Aurelio |
| author2_role |
author author |
| dc.subject.none.fl_str_mv |
Ciencias Informáticas Stock price forecast Clustering Financial Reports Deep learning Investment algorithms Trading |
| topic |
Ciencias Informáticas Stock price forecast Clustering Financial Reports Deep learning Investment algorithms Trading |
| dc.description.none.fl_txt_mv |
This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods. Instituto de Investigación en Informática |
| description |
This paper explores the use of clustering models of stocks to improve both (a) the prediction of stock prices and (b) the returns of trading algorithms. We cluster stocks using k-means and several alternative distance metrics, using as features quarterly financial ratios, prices and daily returns. Then, for each cluster, we train ARIMA and LSTM forecasting models to predict the daily price of each stock in the cluster. Finally, we employ the clustering-empowered forecasting models to analyze the returns of different trading algorithms. We obtain three key results: (i) LSTM models outperform ARIMA and benchmark models, obtaining positive investment returns in several scenarios; (ii) forecasting is improved by using the additional information provided by the clustering methods, therefore selecting relevant data is an important preprocessing task in the forecasting process; (iii) using information from the whole sample of stocks deteriorates the forecasting ability of LSTM models. These results have been validated using data of 240 companies of the Russell 3000 index spanning 2017 to 2022, training and testing with different subperiods. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/160256 |
| url |
http://sedici.unlp.edu.ar/handle/10915/160256 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/issn/1057-5219 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.irfa.2023.102657 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
| reponame_str |
SEDICI (UNLP) |
| collection |
SEDICI (UNLP) |
| instname_str |
Universidad Nacional de La Plata |
| instacron_str |
UNLP |
| institution |
UNLP |
| repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
| repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
| _version_ |
1849876257097908224 |
| score |
13.011256 |