Data-driven high-dimensional statistical inference with generative models

Autores
Amram, Oz; Szewc, Manuel
Año de publicación
2025
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.
Fil: Amram, Oz. No especifíca;
Fil: Szewc, Manuel. University of Cincinnati; Estados Unidos. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Instituto de Ciencias Fisicas. - Universidad Nacional de San Martin. Instituto de Ciencias Fisicas.; Argentina
Materia
MACHINE LEARNING
STATISTICAL INFERENCE
DATA-DRIVEN BACKGROUNDS
DENSITY ESTIMATION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/289536

id CONICETDig_4cb66851facc34c9b7052bf66f59871f
oai_identifier_str oai:ri.conicet.gov.ar:11336/289536
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Data-driven high-dimensional statistical inference with generative modelsAmram, OzSzewc, ManuelMACHINE LEARNINGSTATISTICAL INFERENCEDATA-DRIVEN BACKGROUNDSDENSITY ESTIMATIONhttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.Fil: Amram, Oz. No especifíca;Fil: Szewc, Manuel. University of Cincinnati; Estados Unidos. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Instituto de Ciencias Fisicas. - Universidad Nacional de San Martin. Instituto de Ciencias Fisicas.; ArgentinaSpringer2025-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/289536Amram, Oz; Szewc, Manuel; Data-driven high-dimensional statistical inference with generative models; Springer; Journal of High Energy Physics; 2025; 11; 11-2025; 1-341029-8479CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://link.springer.com/10.1007/JHEP11(2025)129info:eu-repo/semantics/altIdentifier/doi/10.1007/JHEP11(2025)129info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-06-17T10:42:09Zoai:ri.conicet.gov.ar:11336/289536instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-06-17 10:42:09.304CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Data-driven high-dimensional statistical inference with generative models
title Data-driven high-dimensional statistical inference with generative models
spellingShingle Data-driven high-dimensional statistical inference with generative models
Amram, Oz
MACHINE LEARNING
STATISTICAL INFERENCE
DATA-DRIVEN BACKGROUNDS
DENSITY ESTIMATION
title_short Data-driven high-dimensional statistical inference with generative models
title_full Data-driven high-dimensional statistical inference with generative models
title_fullStr Data-driven high-dimensional statistical inference with generative models
title_full_unstemmed Data-driven high-dimensional statistical inference with generative models
title_sort Data-driven high-dimensional statistical inference with generative models
dc.creator.none.fl_str_mv Amram, Oz
Szewc, Manuel
author Amram, Oz
author_facet Amram, Oz
Szewc, Manuel
author_role author
author2 Szewc, Manuel
author2_role author
dc.subject.none.fl_str_mv MACHINE LEARNING
STATISTICAL INFERENCE
DATA-DRIVEN BACKGROUNDS
DENSITY ESTIMATION
topic MACHINE LEARNING
STATISTICAL INFERENCE
DATA-DRIVEN BACKGROUNDS
DENSITY ESTIMATION
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.3
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.
Fil: Amram, Oz. No especifíca;
Fil: Szewc, Manuel. University of Cincinnati; Estados Unidos. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Instituto de Ciencias Fisicas. - Universidad Nacional de San Martin. Instituto de Ciencias Fisicas.; Argentina
description Crucial to many measurements at the LHC is the use of correlated multi-dimensional information to distinguish rare processes from large backgrounds, which is complicated by the poor modeling of many of the crucial backgrounds in Monte Carlo simulations. In this work, we introduce HI-SIGMA, a method to perform unbinned high-dimensional statistical inference with data-driven background distributions. In contradistinction to many applications of Simulation Based Inference in High Energy Physics, HI-SIGMA relies on generative ML models, rather than classifiers, to learn the signal and background distributions in the high-dimensional space. These ML models allow for interpretable inference while also incorporating model errors and other sources of systematic uncertainties. We showcase this methodology on a simplified version of a di-Higgs measurement in the bbγγ final state, where the di-photon resonance allows for background interpolation from sidebands into the signal region. We demonstrate that HI-SIGMA provides improved sensitivity as compared to standard classifier-based methods, and that systematic uncertainties can be straightforwardly incorporated by extending methods which have been used for histogram based analyses.
publishDate 2025
dc.date.none.fl_str_mv 2025-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/289536
Amram, Oz; Szewc, Manuel; Data-driven high-dimensional statistical inference with generative models; Springer; Journal of High Energy Physics; 2025; 11; 11-2025; 1-34
1029-8479
CONICET Digital
CONICET
url http://hdl.handle.net/11336/289536
identifier_str_mv Amram, Oz; Szewc, Manuel; Data-driven high-dimensional statistical inference with generative models; Springer; Journal of High Energy Physics; 2025; 11; 11-2025; 1-34
1029-8479
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://link.springer.com/10.1007/JHEP11(2025)129
info:eu-repo/semantics/altIdentifier/doi/10.1007/JHEP11(2025)129
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1868340407809605632
score 13.040872