Performance analysis of the Survival-SVM classifier applied to gene-expression databases

Autores
Camele, Genaro; Hasperué, Waldo
Año de publicación
2023
Idioma
español castellano
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
The analysis of epigenetic information for the diagnosis and prognosis of patients has been gaining relevance in recent years due to the technological progress that entails a decrease in information extraction and processing costs. One of the tasks most commonly carried out in this area is obtaining models that allow using patient epigenetic information to make inferences about survival analysis. As a result, optimizing these models turns into a problem of great interest today. In this article, the evaluation of different metrics and execution times for the Survival Support Vector Machines model is carried out through survival analysis applied to gene expression databases. Different experiments were performed varying the number of genes used for training to measure the correlation between model performance and data growth. The results showed that linear and polynomial kernels offer a better balance between execution time and model predictive power when the number of genes to be evaluated is less than 2000, while the cosine and RBF kernels are better candidates otherwise.
Instituto de Investigación en Informática
Red de Universidades con Carreras en Informática
Materia
Ciencias Informáticas
Survival analysis
Survival Support Vector Machines
Regression, Performance
Apache Spark
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/164807

id SEDICI_7f473d8e54730d648318d494f975b7f6
oai_identifier_str oai:sedici.unlp.edu.ar:10915/164807
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Performance analysis of the Survival-SVM classifier applied to gene-expression databasesCamele, GenaroHasperué, WaldoCiencias InformáticasSurvival analysisSurvival Support Vector MachinesRegression, PerformanceApache SparkThe analysis of epigenetic information for the diagnosis and prognosis of patients has been gaining relevance in recent years due to the technological progress that entails a decrease in information extraction and processing costs. One of the tasks most commonly carried out in this area is obtaining models that allow using patient epigenetic information to make inferences about survival analysis. As a result, optimizing these models turns into a problem of great interest today. In this article, the evaluation of different metrics and execution times for the Survival Support Vector Machines model is carried out through survival analysis applied to gene expression databases. Different experiments were performed varying the number of genes used for training to measure the correlation between model performance and data growth. The results showed that linear and polynomial kernels offer a better balance between execution time and model predictive power when the number of genes to be evaluated is less than 2000, while the cosine and RBF kernels are better candidates otherwise.Instituto de Investigación en InformáticaRed de Universidades con Carreras en Informática2023-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf97-105http://sedici.unlp.edu.ar/handle/10915/164807spainfo:eu-repo/semantics/altIdentifier/isbn/978-987-9285-51-0info:eu-repo/semantics/reference/url/https://sedici.unlp.edu.ar/handle/10915/163107info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:42:50Zoai:sedici.unlp.edu.ar:10915/164807Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:42:51.109SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Performance analysis of the Survival-SVM classifier applied to gene-expression databases
title Performance analysis of the Survival-SVM classifier applied to gene-expression databases
spellingShingle Performance analysis of the Survival-SVM classifier applied to gene-expression databases
Camele, Genaro
Ciencias Informáticas
Survival analysis
Survival Support Vector Machines
Regression, Performance
Apache Spark
title_short Performance analysis of the Survival-SVM classifier applied to gene-expression databases
title_full Performance analysis of the Survival-SVM classifier applied to gene-expression databases
title_fullStr Performance analysis of the Survival-SVM classifier applied to gene-expression databases
title_full_unstemmed Performance analysis of the Survival-SVM classifier applied to gene-expression databases
title_sort Performance analysis of the Survival-SVM classifier applied to gene-expression databases
dc.creator.none.fl_str_mv Camele, Genaro
Hasperué, Waldo
author Camele, Genaro
author_facet Camele, Genaro
Hasperué, Waldo
author_role author
author2 Hasperué, Waldo
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Survival analysis
Survival Support Vector Machines
Regression, Performance
Apache Spark
topic Ciencias Informáticas
Survival analysis
Survival Support Vector Machines
Regression, Performance
Apache Spark
dc.description.none.fl_txt_mv The analysis of epigenetic information for the diagnosis and prognosis of patients has been gaining relevance in recent years due to the technological progress that entails a decrease in information extraction and processing costs. One of the tasks most commonly carried out in this area is obtaining models that allow using patient epigenetic information to make inferences about survival analysis. As a result, optimizing these models turns into a problem of great interest today. In this article, the evaluation of different metrics and execution times for the Survival Support Vector Machines model is carried out through survival analysis applied to gene expression databases. Different experiments were performed varying the number of genes used for training to measure the correlation between model performance and data growth. The results showed that linear and polynomial kernels offer a better balance between execution time and model predictive power when the number of genes to be evaluated is less than 2000, while the cosine and RBF kernels are better candidates otherwise.
Instituto de Investigación en Informática
Red de Universidades con Carreras en Informática
description The analysis of epigenetic information for the diagnosis and prognosis of patients has been gaining relevance in recent years due to the technological progress that entails a decrease in information extraction and processing costs. One of the tasks most commonly carried out in this area is obtaining models that allow using patient epigenetic information to make inferences about survival analysis. As a result, optimizing these models turns into a problem of great interest today. In this article, the evaluation of different metrics and execution times for the Survival Support Vector Machines model is carried out through survival analysis applied to gene expression databases. Different experiments were performed varying the number of genes used for training to measure the correlation between model performance and data growth. The results showed that linear and polynomial kernels offer a better balance between execution time and model predictive power when the number of genes to be evaluated is less than 2000, while the cosine and RBF kernels are better candidates otherwise.
publishDate 2023
dc.date.none.fl_str_mv 2023-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/164807
url http://sedici.unlp.edu.ar/handle/10915/164807
dc.language.none.fl_str_mv spa
language spa
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-987-9285-51-0
info:eu-repo/semantics/reference/url/https://sedici.unlp.edu.ar/handle/10915/163107
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
97-105
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616297679159296
score 13.070432