A novel clustering approach for biological data using a new distance based on Gene Ontology

Autores
Leale, Guillermo; Milone, Diego H.; Bayá, Ariel E.; Granitto, Pablo Miguel; Stegmayer, Georgina
Año de publicación
2013
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
When applying clustering algorithms on biological data the information about biological processes is not usually present in an explicit way, although this knowledge is later used by biologists to validate the clusters and the relations found among data. This work presents a new distance measure for biological data which combines expression and semantic information, in order to be used into a clustering algorithm. The distance is calculated pairwise among all pairs of genes and it is incorporated during the training process of the clustering algorithm. The approach was evaluated on two real datasets using several validation measures. The obtained results are consistent across all the measures, showing better semantic quality for clusters with the new algorithm in comparison to standard clustering.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
biological data
clustering algorithm
measures
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/76213

id SEDICI_2b7aa2ffda30c813ed9067fb2a3d29e3
oai_identifier_str oai:sedici.unlp.edu.ar:10915/76213
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling A novel clustering approach for biological data using a new distance based on Gene OntologyLeale, GuillermoMilone, Diego H.Bayá, Ariel E.Granitto, Pablo MiguelStegmayer, GeorginaCiencias Informáticasbiological dataclustering algorithmmeasuresWhen applying clustering algorithms on biological data the information about biological processes is not usually present in an explicit way, although this knowledge is later used by biologists to validate the clusters and the relations found among data. This work presents a new distance measure for biological data which combines expression and semantic information, in order to be used into a clustering algorithm. The distance is calculated pairwise among all pairs of genes and it is incorporated during the training process of the clustering algorithm. The approach was evaluated on two real datasets using several validation measures. The obtained results are consistent across all the measures, showing better semantic quality for clusters with the new algorithm in comparison to standard clustering.Sociedad Argentina de Informática e Investigación Operativa2013-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf85-96http://sedici.unlp.edu.ar/handle/10915/76213enginfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/4.0/Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:13:24Zoai:sedici.unlp.edu.ar:10915/76213Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:13:24.7SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv A novel clustering approach for biological data using a new distance based on Gene Ontology
title A novel clustering approach for biological data using a new distance based on Gene Ontology
spellingShingle A novel clustering approach for biological data using a new distance based on Gene Ontology
Leale, Guillermo
Ciencias Informáticas
biological data
clustering algorithm
measures
title_short A novel clustering approach for biological data using a new distance based on Gene Ontology
title_full A novel clustering approach for biological data using a new distance based on Gene Ontology
title_fullStr A novel clustering approach for biological data using a new distance based on Gene Ontology
title_full_unstemmed A novel clustering approach for biological data using a new distance based on Gene Ontology
title_sort A novel clustering approach for biological data using a new distance based on Gene Ontology
dc.creator.none.fl_str_mv Leale, Guillermo
Milone, Diego H.
Bayá, Ariel E.
Granitto, Pablo Miguel
Stegmayer, Georgina
author Leale, Guillermo
author_facet Leale, Guillermo
Milone, Diego H.
Bayá, Ariel E.
Granitto, Pablo Miguel
Stegmayer, Georgina
author_role author
author2 Milone, Diego H.
Bayá, Ariel E.
Granitto, Pablo Miguel
Stegmayer, Georgina
author2_role author
author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
biological data
clustering algorithm
measures
topic Ciencias Informáticas
biological data
clustering algorithm
measures
dc.description.none.fl_txt_mv When applying clustering algorithms on biological data the information about biological processes is not usually present in an explicit way, although this knowledge is later used by biologists to validate the clusters and the relations found among data. This work presents a new distance measure for biological data which combines expression and semantic information, in order to be used into a clustering algorithm. The distance is calculated pairwise among all pairs of genes and it is incorporated during the training process of the clustering algorithm. The approach was evaluated on two real datasets using several validation measures. The obtained results are consistent across all the measures, showing better semantic quality for clusters with the new algorithm in comparison to standard clustering.
Sociedad Argentina de Informática e Investigación Operativa
description When applying clustering algorithms on biological data the information about biological processes is not usually present in an explicit way, although this knowledge is later used by biologists to validate the clusters and the relations found among data. This work presents a new distance measure for biological data which combines expression and semantic information, in order to be used into a clustering algorithm. The distance is calculated pairwise among all pairs of genes and it is incorporated during the training process of the clustering algorithm. The approach was evaluated on two real datasets using several validation measures. The obtained results are consistent across all the measures, showing better semantic quality for clusters with the new algorithm in comparison to standard clustering.
publishDate 2013
dc.date.none.fl_str_mv 2013-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/76213
url http://sedici.unlp.edu.ar/handle/10915/76213
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/1850-2784
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-sa/4.0/
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-sa/4.0/
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.format.none.fl_str_mv application/pdf
85-96
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616004676616192
score 13.070432