Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Autores
Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; Verspoor, Karin; Wang, Zhiping; Rocha, Luis
Año de publicación
2008
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; Portugal
Fil: Kaur, Jasleen. Indiana University; Estados Unidos
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Radivojac, Pedrag. Indiana University; Estados Unidos
Fil: Rechtsteiner, Andreas. Indiana University; Estados Unidos
Fil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados Unidos
Fil: Wang, Zhiping. Indiana University; Estados Unidos
Fil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados Unidos
Materia
Support Vector Machine
Singular Value Decomposition
Word Pair
Singular Value Decomposition Method
Proximity Network
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/75086

id CONICETDig_1837bc70aa7a1bf17982618acbb72bdc
oai_identifier_str oai:ri.conicet.gov.ar:11336/75086
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networksAbi-Haidar, AlaaKaur, JasleenMaguitman, Ana GabrielaRadivojac, PedragRechtsteiner, AndreasVerspoor, KarinWang, ZhipingRocha, LuisSupport Vector MachineSingular Value DecompositionWord PairSingular Value Decomposition MethodProximity Networkhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; PortugalFil: Kaur, Jasleen. Indiana University; Estados UnidosFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Radivojac, Pedrag. Indiana University; Estados UnidosFil: Rechtsteiner, Andreas. Indiana University; Estados UnidosFil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados UnidosFil: Wang, Zhiping. Indiana University; Estados UnidosFil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados UnidosBioMed Central2008-09-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/75086Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S301474-760XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2008-9-S2-S11info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:52:20Zoai:ri.conicet.gov.ar:11336/75086instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:52:20.785CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
spellingShingle Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
Abi-Haidar, Alaa
Support Vector Machine
Singular Value Decomposition
Word Pair
Singular Value Decomposition Method
Proximity Network
title_short Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_full Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_fullStr Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_full_unstemmed Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_sort Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
dc.creator.none.fl_str_mv Abi-Haidar, Alaa
Kaur, Jasleen
Maguitman, Ana Gabriela
Radivojac, Pedrag
Rechtsteiner, Andreas
Verspoor, Karin
Wang, Zhiping
Rocha, Luis
author Abi-Haidar, Alaa
author_facet Abi-Haidar, Alaa
Kaur, Jasleen
Maguitman, Ana Gabriela
Radivojac, Pedrag
Rechtsteiner, Andreas
Verspoor, Karin
Wang, Zhiping
Rocha, Luis
author_role author
author2 Kaur, Jasleen
Maguitman, Ana Gabriela
Radivojac, Pedrag
Rechtsteiner, Andreas
Verspoor, Karin
Wang, Zhiping
Rocha, Luis
author2_role author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Support Vector Machine
Singular Value Decomposition
Word Pair
Singular Value Decomposition Method
Proximity Network
topic Support Vector Machine
Singular Value Decomposition
Word Pair
Singular Value Decomposition Method
Proximity Network
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; Portugal
Fil: Kaur, Jasleen. Indiana University; Estados Unidos
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Radivojac, Pedrag. Indiana University; Estados Unidos
Fil: Rechtsteiner, Andreas. Indiana University; Estados Unidos
Fil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados Unidos
Fil: Wang, Zhiping. Indiana University; Estados Unidos
Fil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados Unidos
description Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
publishDate 2008
dc.date.none.fl_str_mv 2008-09-01
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/75086
Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30
1474-760X
CONICET Digital
CONICET
url http://hdl.handle.net/11336/75086
identifier_str_mv Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30
1474-760X
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/
info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2008-9-S2-S11
info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv BioMed Central
publisher.none.fl_str_mv BioMed Central
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613605558845440
score 13.070432