Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes

Autores
Cosa Rodriguez, Pablo; Marti Puig, Pere; Caiafa, Cesar Federico; Serra Serra, Moisès; Cusidó, Jordi; Solé Casals, Jordi
Año de publicación
2023
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Product maintenance costs throughout the product’s lifetime can account for between 30–60% of total operating costs, making it necessary to implement maintenance strategies. This problem not only affects the economy but is also related to the impact on the environment, since breakdowns are also responsible for the delivery of greenhouse gases. Industrial maintenance is a set of measures of a technical-organizational nature whose purpose is to sustain the functionality of the equipment and guarantee an optimal state of the machines over time, with the aim of saving costs, extending the useful life of the machines, saving energy, maximising production and availability, ensuring the quality of the product obtained, providing job security for technicians, preserving the environment, and reducing emissions as much as possible. Machine learning techniques can be used to detect or predict faults in wind turbines. However, labelled data suffers from many problems in this application because alarms are usually not clearly associated with a specific fault, some labels are wrongly associated with a problem, and the imbalance between labels is evident. To avoid using labelled data, we investigate here the use of the clustering technique, more specifically K-means, and boxplot representations of the variables for a set of six different tests. Experimental results show that in some cases, the clustering and boxplot techniques allow us to determine outliers or identify erroneous behaviours of the wind turbines. These cases can then be investigated in detail by a specialist so that more efficient predictive maintenance can be carried out.
Instituto Argentino de Radioastronomía
Materia
Ingeniería
Informática
Predictive maintenance
Prognosis
Machine learning
K-means
Clustering
SCADA data
Renewable energies
Wind turbine
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/152530

id SEDICI_303c9077530f2e973e46ede0e5549178
oai_identifier_str oai:sedici.unlp.edu.ar:10915/152530
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposesCosa Rodriguez, PabloMarti Puig, PereCaiafa, Cesar FedericoSerra Serra, MoisèsCusidó, JordiSolé Casals, JordiIngenieríaInformáticaPredictive maintenancePrognosisMachine learningK-meansClusteringSCADA dataRenewable energiesWind turbineProduct maintenance costs throughout the product’s lifetime can account for between 30–60% of total operating costs, making it necessary to implement maintenance strategies. This problem not only affects the economy but is also related to the impact on the environment, since breakdowns are also responsible for the delivery of greenhouse gases. Industrial maintenance is a set of measures of a technical-organizational nature whose purpose is to sustain the functionality of the equipment and guarantee an optimal state of the machines over time, with the aim of saving costs, extending the useful life of the machines, saving energy, maximising production and availability, ensuring the quality of the product obtained, providing job security for technicians, preserving the environment, and reducing emissions as much as possible. Machine learning techniques can be used to detect or predict faults in wind turbines. However, labelled data suffers from many problems in this application because alarms are usually not clearly associated with a specific fault, some labels are wrongly associated with a problem, and the imbalance between labels is evident. To avoid using labelled data, we investigate here the use of the clustering technique, more specifically K-means, and boxplot representations of the variables for a set of six different tests. Experimental results show that in some cases, the clustering and boxplot techniques allow us to determine outliers or identify erroneous behaviours of the wind turbines. These cases can then be investigated in detail by a specialist so that more efficient predictive maintenance can be carried out.Instituto Argentino de Radioastronomía2023info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/152530enginfo:eu-repo/semantics/altIdentifier/issn/2075-1702info:eu-repo/semantics/altIdentifier/doi/10.3390/machines11020270info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:39:28Zoai:sedici.unlp.edu.ar:10915/152530Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:39:28.536SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
title Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
spellingShingle Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
Cosa Rodriguez, Pablo
Ingeniería
Informática
Predictive maintenance
Prognosis
Machine learning
K-means
Clustering
SCADA data
Renewable energies
Wind turbine
title_short Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
title_full Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
title_fullStr Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
title_full_unstemmed Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
title_sort Exploratory analysis of SCADA data fromwind turbines using the K-means clustering algorithm for predictive maintenance purposes
dc.creator.none.fl_str_mv Cosa Rodriguez, Pablo
Marti Puig, Pere
Caiafa, Cesar Federico
Serra Serra, Moisès
Cusidó, Jordi
Solé Casals, Jordi
author Cosa Rodriguez, Pablo
author_facet Cosa Rodriguez, Pablo
Marti Puig, Pere
Caiafa, Cesar Federico
Serra Serra, Moisès
Cusidó, Jordi
Solé Casals, Jordi
author_role author
author2 Marti Puig, Pere
Caiafa, Cesar Federico
Serra Serra, Moisès
Cusidó, Jordi
Solé Casals, Jordi
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Ingeniería
Informática
Predictive maintenance
Prognosis
Machine learning
K-means
Clustering
SCADA data
Renewable energies
Wind turbine
topic Ingeniería
Informática
Predictive maintenance
Prognosis
Machine learning
K-means
Clustering
SCADA data
Renewable energies
Wind turbine
dc.description.none.fl_txt_mv Product maintenance costs throughout the product’s lifetime can account for between 30–60% of total operating costs, making it necessary to implement maintenance strategies. This problem not only affects the economy but is also related to the impact on the environment, since breakdowns are also responsible for the delivery of greenhouse gases. Industrial maintenance is a set of measures of a technical-organizational nature whose purpose is to sustain the functionality of the equipment and guarantee an optimal state of the machines over time, with the aim of saving costs, extending the useful life of the machines, saving energy, maximising production and availability, ensuring the quality of the product obtained, providing job security for technicians, preserving the environment, and reducing emissions as much as possible. Machine learning techniques can be used to detect or predict faults in wind turbines. However, labelled data suffers from many problems in this application because alarms are usually not clearly associated with a specific fault, some labels are wrongly associated with a problem, and the imbalance between labels is evident. To avoid using labelled data, we investigate here the use of the clustering technique, more specifically K-means, and boxplot representations of the variables for a set of six different tests. Experimental results show that in some cases, the clustering and boxplot techniques allow us to determine outliers or identify erroneous behaviours of the wind turbines. These cases can then be investigated in detail by a specialist so that more efficient predictive maintenance can be carried out.
Instituto Argentino de Radioastronomía
description Product maintenance costs throughout the product’s lifetime can account for between 30–60% of total operating costs, making it necessary to implement maintenance strategies. This problem not only affects the economy but is also related to the impact on the environment, since breakdowns are also responsible for the delivery of greenhouse gases. Industrial maintenance is a set of measures of a technical-organizational nature whose purpose is to sustain the functionality of the equipment and guarantee an optimal state of the machines over time, with the aim of saving costs, extending the useful life of the machines, saving energy, maximising production and availability, ensuring the quality of the product obtained, providing job security for technicians, preserving the environment, and reducing emissions as much as possible. Machine learning techniques can be used to detect or predict faults in wind turbines. However, labelled data suffers from many problems in this application because alarms are usually not clearly associated with a specific fault, some labels are wrongly associated with a problem, and the imbalance between labels is evident. To avoid using labelled data, we investigate here the use of the clustering technique, more specifically K-means, and boxplot representations of the variables for a set of six different tests. Experimental results show that in some cases, the clustering and boxplot techniques allow us to determine outliers or identify erroneous behaviours of the wind turbines. These cases can then be investigated in detail by a specialist so that more efficient predictive maintenance can be carried out.
publishDate 2023
dc.date.none.fl_str_mv 2023
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/152530
url http://sedici.unlp.edu.ar/handle/10915/152530
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/2075-1702
info:eu-repo/semantics/altIdentifier/doi/10.3390/machines11020270
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616268679741440
score 13.070432