Null hypothesis test for anomaly detection

Autores
Kamenik, Jernej F.; Szewc, Manuel
Año de publicación
2023
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
Fil: Kamenik, Jernej F.. University of Ljubljana; Eslovenia
Fil: Szewc, Manuel. University of Ljubljana; Eslovenia. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Ciencias Físicas. - Universidad Nacional de San Martín. Instituto de Ciencias Físicas; Argentina
Materia
LHC
ANOMALY DETECTION
CWOLA
UNSUPERVISED
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/237150

id CONICETDig_29853a537cec447e1ad11fc1558d9078
oai_identifier_str oai:ri.conicet.gov.ar:11336/237150
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Null hypothesis test for anomaly detectionKamenik, Jernej F.Szewc, ManuelLHCANOMALY DETECTIONCWOLAUNSUPERVISEDhttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.Fil: Kamenik, Jernej F.. University of Ljubljana; EsloveniaFil: Szewc, Manuel. University of Ljubljana; Eslovenia. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Ciencias Físicas. - Universidad Nacional de San Martín. Instituto de Ciencias Físicas; ArgentinaElsevier Science2023-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/237150Kamenik, Jernej F.; Szewc, Manuel; Null hypothesis test for anomaly detection; Elsevier Science; Physics Letters B; 840; 5-2023; 1-80370-2693CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S0370269323001703info:eu-repo/semantics/altIdentifier/doi/10.1016/j.physletb.2023.137836info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:30:17Zoai:ri.conicet.gov.ar:11336/237150instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:30:17.377CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Null hypothesis test for anomaly detection
title Null hypothesis test for anomaly detection
spellingShingle Null hypothesis test for anomaly detection
Kamenik, Jernej F.
LHC
ANOMALY DETECTION
CWOLA
UNSUPERVISED
title_short Null hypothesis test for anomaly detection
title_full Null hypothesis test for anomaly detection
title_fullStr Null hypothesis test for anomaly detection
title_full_unstemmed Null hypothesis test for anomaly detection
title_sort Null hypothesis test for anomaly detection
dc.creator.none.fl_str_mv Kamenik, Jernej F.
Szewc, Manuel
author Kamenik, Jernej F.
author_facet Kamenik, Jernej F.
Szewc, Manuel
author_role author
author2 Szewc, Manuel
author2_role author
dc.subject.none.fl_str_mv LHC
ANOMALY DETECTION
CWOLA
UNSUPERVISED
topic LHC
ANOMALY DETECTION
CWOLA
UNSUPERVISED
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.3
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
Fil: Kamenik, Jernej F.. University of Ljubljana; Eslovenia
Fil: Szewc, Manuel. University of Ljubljana; Eslovenia. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Ciencias Físicas. - Universidad Nacional de San Martín. Instituto de Ciencias Físicas; Argentina
description We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
publishDate 2023
dc.date.none.fl_str_mv 2023-05
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/237150
Kamenik, Jernej F.; Szewc, Manuel; Null hypothesis test for anomaly detection; Elsevier Science; Physics Letters B; 840; 5-2023; 1-8
0370-2693
CONICET Digital
CONICET
url http://hdl.handle.net/11336/237150
identifier_str_mv Kamenik, Jernej F.; Szewc, Manuel; Null hypothesis test for anomaly detection; Elsevier Science; Physics Letters B; 840; 5-2023; 1-8
0370-2693
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S0370269323001703
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.physletb.2023.137836
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science
publisher.none.fl_str_mv Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614311007223808
score 13.070432