Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Autores: Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; Verspoor, Karin; Wang, Zhiping; Rocha, Luis
Año de publicación: 2008
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; Portugal
Fil: Kaur, Jasleen. Indiana University; Estados Unidos
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Radivojac, Pedrag. Indiana University; Estados Unidos
Fil: Rechtsteiner, Andreas. Indiana University; Estados Unidos
Fil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados Unidos
Fil: Wang, Zhiping. Indiana University; Estados Unidos
Fil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados Unidos
Materia: Support Vector Machine
Singular Value Decomposition
Word Pair
Singular Value Decomposition Method
Proximity Network
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/75086

Acceder

id	CONICETDig_1837bc70aa7a1bf17982618acbb72bdc
oai_identifier_str	oai:ri.conicet.gov.ar:11336/75086
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networksAbi-Haidar, AlaaKaur, JasleenMaguitman, Ana GabrielaRadivojac, PedragRechtsteiner, AndreasVerspoor, KarinWang, ZhipingRocha, LuisSupport Vector MachineSingular Value DecompositionWord PairSingular Value Decomposition MethodProximity Networkhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; PortugalFil: Kaur, Jasleen. Indiana University; Estados UnidosFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Radivojac, Pedrag. Indiana University; Estados UnidosFil: Rechtsteiner, Andreas. Indiana University; Estados UnidosFil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados UnidosFil: Wang, Zhiping. Indiana University; Estados UnidosFil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados UnidosBioMed Central2008-09-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/75086Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S301474-760XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2008-9-S2-S11info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:08:14Zoai:ri.conicet.gov.ar:11336/75086instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:08:14.912CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
spellingShingle	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks Abi-Haidar, Alaa Support Vector Machine Singular Value Decomposition Word Pair Singular Value Decomposition Method Proximity Network
title_short	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_full	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_fullStr	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_full_unstemmed	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_sort	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
dc.creator.none.fl_str_mv	Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Gabriela Radivojac, Pedrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis
author	Abi-Haidar, Alaa
author_facet	Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Gabriela Radivojac, Pedrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis
author_role	author
author2	Kaur, Jasleen Maguitman, Ana Gabriela Radivojac, Pedrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis
author2_role	author author author author author author author
dc.subject.none.fl_str_mv	Support Vector Machine Singular Value Decomposition Word Pair Singular Value Decomposition Method Proximity Network
topic	Support Vector Machine Singular Value Decomposition Word Pair Singular Value Decomposition Method Proximity Network
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed. Fil: Abi-Haidar, Alaa. Indiana University; Estados Unidos. Fundação Luso-Americana para o Desenvolvimento; Portugal Fil: Kaur, Jasleen. Indiana University; Estados Unidos Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina Fil: Radivojac, Pedrag. Indiana University; Estados Unidos Fil: Rechtsteiner, Andreas. Indiana University; Estados Unidos Fil: Verspoor, Karin. Los Alamos National High Magnetic Field Laboratory; Estados Unidos Fil: Wang, Zhiping. Indiana University; Estados Unidos Fil: Rocha, Luis. Fundação Luso-Americana para o Desenvolvimento; Portugal. Indiana University; Estados Unidos
description	Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
publishDate	2008
dc.date.none.fl_str_mv	2008-09-01
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/75086 Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30 1474-760X CONICET Digital CONICET
url	http://hdl.handle.net/11336/75086
identifier_str_mv	Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30 1474-760X CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/ info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2008-9-S2-S11 info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	BioMed Central
publisher.none.fl_str_mv	BioMed Central
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1858305303813029888
score	13.176822

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Publicaciones similares