Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
- Autores
- Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; Verspoor, Karin; Wang, Zhiping; Rocha, Luis
- Año de publicación
- 2008
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; Portugal
Fil: Kaur, Jasleen. Indiana University; Estados Unidos
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Radivojac, Pedrag. Indiana University; Estados Unidos
Fil: Rechtsteiner, Andreas. Indiana University; Estados Unidos
Fil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados Unidos
Fil: Wang, Zhiping. Indiana University; Estados Unidos
Fil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados Unidos - Materia
-
Support Vector Machine
Singular Value Decomposition
Word Pair
Singular Value Decomposition Method
Proximity Network - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/75086
Ver los metadatos del registro completo
id |
CONICETDig_1837bc70aa7a1bf17982618acbb72bdc |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/75086 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networksAbi-Haidar, AlaaKaur, JasleenMaguitman, Ana GabrielaRadivojac, PedragRechtsteiner, AndreasVerspoor, KarinWang, ZhipingRocha, LuisSupport Vector MachineSingular Value DecompositionWord PairSingular Value Decomposition MethodProximity Networkhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; PortugalFil: Kaur, Jasleen. Indiana University; Estados UnidosFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Radivojac, Pedrag. Indiana University; Estados UnidosFil: Rechtsteiner, Andreas. Indiana University; Estados UnidosFil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados UnidosFil: Wang, Zhiping. Indiana University; Estados UnidosFil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados UnidosBioMed Central2008-09-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/75086Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S301474-760XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2008-9-S2-S11info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:52:20Zoai:ri.conicet.gov.ar:11336/75086instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:52:20.785CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
title |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
spellingShingle |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks Abi-Haidar, Alaa Support Vector Machine Singular Value Decomposition Word Pair Singular Value Decomposition Method Proximity Network |
title_short |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
title_full |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
title_fullStr |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
title_full_unstemmed |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
title_sort |
Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks |
dc.creator.none.fl_str_mv |
Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Gabriela Radivojac, Pedrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis |
author |
Abi-Haidar, Alaa |
author_facet |
Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Gabriela Radivojac, Pedrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis |
author_role |
author |
author2 |
Kaur, Jasleen Maguitman, Ana Gabriela Radivojac, Pedrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis |
author2_role |
author author author author author author author |
dc.subject.none.fl_str_mv |
Support Vector Machine Singular Value Decomposition Word Pair Singular Value Decomposition Method Proximity Network |
topic |
Support Vector Machine Singular Value Decomposition Word Pair Singular Value Decomposition Method Proximity Network |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed. Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; Portugal Fil: Kaur, Jasleen. Indiana University; Estados Unidos Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina Fil: Radivojac, Pedrag. Indiana University; Estados Unidos Fil: Rechtsteiner, Andreas. Indiana University; Estados Unidos Fil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados Unidos Fil: Wang, Zhiping. Indiana University; Estados Unidos Fil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados Unidos |
description |
Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed. |
publishDate |
2008 |
dc.date.none.fl_str_mv |
2008-09-01 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/75086 Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30 1474-760X CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/75086 |
identifier_str_mv |
Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30 1474-760X CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/ info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2008-9-S2-S11 info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
BioMed Central |
publisher.none.fl_str_mv |
BioMed Central |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613605558845440 |
score |
13.070432 |