PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria

Autores
Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; Trevizani, Raphael; Nicolás, Marisa Fabiana
Año de publicación
2025
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.
Fil: Martins, Yasmmin C.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Cerqueira e Costa, Maiana O.. No especifíca;
Fil: Palumbo, Miranda Clara. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina
Fil: Fernández Do Porto, Darío Augusto. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Custódio, Fábio L.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Trevizani, Raphael. Fundación Oswaldo Cruz; Brasil
Fil: Nicolás, Marisa Fabiana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Materia
Antigenicity prediction
Machine learning classifiers
Feature extraction
SHAP analysis
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/282668

id CONICETDig_5840f5d5453d135c223bde199f1bb741
oai_identifier_str oai:ri.conicet.gov.ar:11336/282668
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across BacteriaMartins, Yasmmin C.Cerqueira e Costa, Maiana O.Palumbo, Miranda ClaraFernández Do Porto, Darío AugustoCustódio, Fábio L.Trevizani, RaphaelNicolás, Marisa FabianaAntigenicity predictionMachine learning classifiersFeature extractionSHAP analysishttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.Fil: Martins, Yasmmin C.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Cerqueira e Costa, Maiana O.. No especifíca;Fil: Palumbo, Miranda Clara. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; ArgentinaFil: Fernández Do Porto, Darío Augusto. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Custódio, Fábio L.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Trevizani, Raphael. Fundación Oswaldo Cruz; BrasilFil: Nicolás, Marisa Fabiana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaAmerican Chemical Society2025-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/282668Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; et al.; PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria; American Chemical Society; ACS Omega; 10; 6; 2-2025; 5415-54292470-13432470-1343CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c07147info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c07147info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-03-11T12:20:04Zoai:ri.conicet.gov.ar:11336/282668instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-03-11 12:20:05.259CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
title PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
spellingShingle PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
Martins, Yasmmin C.
Antigenicity prediction
Machine learning classifiers
Feature extraction
SHAP analysis
title_short PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
title_full PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
title_fullStr PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
title_full_unstemmed PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
title_sort PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
dc.creator.none.fl_str_mv Martins, Yasmmin C.
Cerqueira e Costa, Maiana O.
Palumbo, Miranda Clara
Fernández Do Porto, Darío Augusto
Custódio, Fábio L.
Trevizani, Raphael
Nicolás, Marisa Fabiana
author Martins, Yasmmin C.
author_facet Martins, Yasmmin C.
Cerqueira e Costa, Maiana O.
Palumbo, Miranda Clara
Fernández Do Porto, Darío Augusto
Custódio, Fábio L.
Trevizani, Raphael
Nicolás, Marisa Fabiana
author_role author
author2 Cerqueira e Costa, Maiana O.
Palumbo, Miranda Clara
Fernández Do Porto, Darío Augusto
Custódio, Fábio L.
Trevizani, Raphael
Nicolás, Marisa Fabiana
author2_role author
author
author
author
author
author
dc.subject.none.fl_str_mv Antigenicity prediction
Machine learning classifiers
Feature extraction
SHAP analysis
topic Antigenicity prediction
Machine learning classifiers
Feature extraction
SHAP analysis
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.
Fil: Martins, Yasmmin C.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Cerqueira e Costa, Maiana O.. No especifíca;
Fil: Palumbo, Miranda Clara. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina
Fil: Fernández Do Porto, Darío Augusto. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Custódio, Fábio L.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Trevizani, Raphael. Fundación Oswaldo Cruz; Brasil
Fil: Nicolás, Marisa Fabiana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
description Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.
publishDate 2025
dc.date.none.fl_str_mv 2025-02
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/282668
Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; et al.; PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria; American Chemical Society; ACS Omega; 10; 6; 2-2025; 5415-5429
2470-1343
2470-1343
CONICET Digital
CONICET
url http://hdl.handle.net/11336/282668
identifier_str_mv Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; et al.; PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria; American Chemical Society; ACS Omega; 10; 6; 2-2025; 5415-5429
2470-1343
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c07147
info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c07147
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv American Chemical Society
publisher.none.fl_str_mv American Chemical Society
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1859460312298684416
score 12.977003