PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
- Autores
- Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; Trevizani, Raphael; Nicolás, Marisa Fabiana
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.
Fil: Martins, Yasmmin C.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Cerqueira e Costa, Maiana O.. No especifíca;
Fil: Palumbo, Miranda Clara. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina
Fil: Fernández Do Porto, Darío Augusto. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Custódio, Fábio L.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina
Fil: Trevizani, Raphael. Fundación Oswaldo Cruz; Brasil
Fil: Nicolás, Marisa Fabiana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina - Materia
-
Antigenicity prediction
Machine learning classifiers
Feature extraction
SHAP analysis - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/282668
Ver los metadatos del registro completo
| id |
CONICETDig_5840f5d5453d135c223bde199f1bb741 |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/282668 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across BacteriaMartins, Yasmmin C.Cerqueira e Costa, Maiana O.Palumbo, Miranda ClaraFernández Do Porto, Darío AugustoCustódio, Fábio L.Trevizani, RaphaelNicolás, Marisa FabianaAntigenicity predictionMachine learning classifiersFeature extractionSHAP analysishttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.Fil: Martins, Yasmmin C.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Cerqueira e Costa, Maiana O.. No especifíca;Fil: Palumbo, Miranda Clara. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; ArgentinaFil: Fernández Do Porto, Darío Augusto. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Custódio, Fábio L.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Trevizani, Raphael. Fundación Oswaldo Cruz; BrasilFil: Nicolás, Marisa Fabiana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaAmerican Chemical Society2025-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/282668Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; et al.; PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria; American Chemical Society; ACS Omega; 10; 6; 2-2025; 5415-54292470-13432470-1343CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c07147info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c07147info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-03-11T12:20:04Zoai:ri.conicet.gov.ar:11336/282668instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-03-11 12:20:05.259CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| title |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| spellingShingle |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria Martins, Yasmmin C. Antigenicity prediction Machine learning classifiers Feature extraction SHAP analysis |
| title_short |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| title_full |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| title_fullStr |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| title_full_unstemmed |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| title_sort |
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria |
| dc.creator.none.fl_str_mv |
Martins, Yasmmin C. Cerqueira e Costa, Maiana O. Palumbo, Miranda Clara Fernández Do Porto, Darío Augusto Custódio, Fábio L. Trevizani, Raphael Nicolás, Marisa Fabiana |
| author |
Martins, Yasmmin C. |
| author_facet |
Martins, Yasmmin C. Cerqueira e Costa, Maiana O. Palumbo, Miranda Clara Fernández Do Porto, Darío Augusto Custódio, Fábio L. Trevizani, Raphael Nicolás, Marisa Fabiana |
| author_role |
author |
| author2 |
Cerqueira e Costa, Maiana O. Palumbo, Miranda Clara Fernández Do Porto, Darío Augusto Custódio, Fábio L. Trevizani, Raphael Nicolás, Marisa Fabiana |
| author2_role |
author author author author author author |
| dc.subject.none.fl_str_mv |
Antigenicity prediction Machine learning classifiers Feature extraction SHAP analysis |
| topic |
Antigenicity prediction Machine learning classifiers Feature extraction SHAP analysis |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets. Fil: Martins, Yasmmin C.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina Fil: Cerqueira e Costa, Maiana O.. No especifíca; Fil: Palumbo, Miranda Clara. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina Fil: Fernández Do Porto, Darío Augusto. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Calculo. - Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Calculo; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina Fil: Custódio, Fábio L.. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina Fil: Trevizani, Raphael. Fundación Oswaldo Cruz; Brasil Fil: Nicolás, Marisa Fabiana. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; Argentina |
| description |
Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-02 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/282668 Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; et al.; PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria; American Chemical Society; ACS Omega; 10; 6; 2-2025; 5415-5429 2470-1343 2470-1343 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/282668 |
| identifier_str_mv |
Martins, Yasmmin C.; Cerqueira e Costa, Maiana O.; Palumbo, Miranda Clara; Fernández Do Porto, Darío Augusto; Custódio, Fábio L.; et al.; PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria; American Chemical Society; ACS Omega; 10; 6; 2-2025; 5415-5429 2470-1343 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c07147 info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c07147 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
American Chemical Society |
| publisher.none.fl_str_mv |
American Chemical Society |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1859460312298684416 |
| score |
12.977003 |