Data-driven high-dimensional statistical inference with generative models
- Autores
- Amram, Oz; Szewc, Manuel
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.
Fil: Amram, Oz. No especifíca;
Fil: Szewc, Manuel. University of Cincinnati; Estados Unidos. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Instituto de Ciencias Fisicas. - Universidad Nacional de San Martin. Instituto de Ciencias Fisicas.; Argentina - Materia
-
MACHINE LEARNING
STATISTICAL INFERENCE
DATA-DRIVEN BACKGROUNDS
DENSITY ESTIMATION - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/289536
Ver los metadatos del registro completo
| id |
CONICETDig_4cb66851facc34c9b7052bf66f59871f |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/289536 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Data-driven high-dimensional statistical inference with generative modelsAmram, OzSzewc, ManuelMACHINE LEARNINGSTATISTICAL INFERENCEDATA-DRIVEN BACKGROUNDSDENSITY ESTIMATIONhttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.Fil: Amram, Oz. No especifíca;Fil: Szewc, Manuel. University of Cincinnati; Estados Unidos. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Instituto de Ciencias Fisicas. - Universidad Nacional de San Martin. Instituto de Ciencias Fisicas.; ArgentinaSpringer2025-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/289536Amram, Oz; Szewc, Manuel; Data-driven high-dimensional statistical inference with generative models; Springer; Journal of High Energy Physics; 2025; 11; 11-2025; 1-341029-8479CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://link.springer.com/10.1007/JHEP11(2025)129info:eu-repo/semantics/altIdentifier/doi/10.1007/JHEP11(2025)129info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-06-17T10:42:09Zoai:ri.conicet.gov.ar:11336/289536instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-06-17 10:42:09.304CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Data-driven high-dimensional statistical inference with generative models |
| title |
Data-driven high-dimensional statistical inference with generative models |
| spellingShingle |
Data-driven high-dimensional statistical inference with generative models Amram, Oz MACHINE LEARNING STATISTICAL INFERENCE DATA-DRIVEN BACKGROUNDS DENSITY ESTIMATION |
| title_short |
Data-driven high-dimensional statistical inference with generative models |
| title_full |
Data-driven high-dimensional statistical inference with generative models |
| title_fullStr |
Data-driven high-dimensional statistical inference with generative models |
| title_full_unstemmed |
Data-driven high-dimensional statistical inference with generative models |
| title_sort |
Data-driven high-dimensional statistical inference with generative models |
| dc.creator.none.fl_str_mv |
Amram, Oz Szewc, Manuel |
| author |
Amram, Oz |
| author_facet |
Amram, Oz Szewc, Manuel |
| author_role |
author |
| author2 |
Szewc, Manuel |
| author2_role |
author |
| dc.subject.none.fl_str_mv |
MACHINE LEARNING STATISTICAL INFERENCE DATA-DRIVEN BACKGROUNDS DENSITY ESTIMATION |
| topic |
MACHINE LEARNING STATISTICAL INFERENCE DATA-DRIVEN BACKGROUNDS DENSITY ESTIMATION |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.3 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses. Fil: Amram, Oz. No especifíca; Fil: Szewc, Manuel. University of Cincinnati; Estados Unidos. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Instituto de Ciencias Fisicas. - Universidad Nacional de San Martin. Instituto de Ciencias Fisicas.; Argentina |
| description |
Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-11 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/289536 Amram, Oz; Szewc, Manuel; Data-driven high-dimensional statistical inference with generative models; Springer; Journal of High Energy Physics; 2025; 11; 11-2025; 1-34 1029-8479 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/289536 |
| identifier_str_mv |
Amram, Oz; Szewc, Manuel; Data-driven high-dimensional statistical inference with generative models; Springer; Journal of High Energy Physics; 2025; 11; 11-2025; 1-34 1029-8479 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://link.springer.com/10.1007/JHEP11(2025)129 info:eu-repo/semantics/altIdentifier/doi/10.1007/JHEP11(2025)129 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Springer |
| publisher.none.fl_str_mv |
Springer |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1868340407809605632 |
| score |
13.040872 |