Speeding up the execution of a large number of statistical tests of independence

Autores
Schlüter, Federico; Bromberg, Facundo; Pérez, Diego Sebastián
Año de publicación
2010
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/152584

id SEDICI_25d70904355930866078ae3af18370ce
oai_identifier_str oai:sedici.unlp.edu.ar:10915/152584
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Speeding up the execution of a large number of statistical tests of independenceSchlüter, FedericoBromberg, FacundoPérez, Diego SebastiánCiencias Informáticasstatistical tests of independencecontingency tablesprobabilistic graphical modelsstructure learningA massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.Sociedad Argentina de Informática e Investigación Operativa2010info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf48-59http://sedici.unlp.edu.ar/handle/10915/152584enginfo:eu-repo/semantics/altIdentifier/url/http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdfinfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:39:21Zoai:sedici.unlp.edu.ar:10915/152584Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:39:21.754SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Speeding up the execution of a large number of statistical tests of independence
title Speeding up the execution of a large number of statistical tests of independence
spellingShingle Speeding up the execution of a large number of statistical tests of independence
Schlüter, Federico
Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
title_short Speeding up the execution of a large number of statistical tests of independence
title_full Speeding up the execution of a large number of statistical tests of independence
title_fullStr Speeding up the execution of a large number of statistical tests of independence
title_full_unstemmed Speeding up the execution of a large number of statistical tests of independence
title_sort Speeding up the execution of a large number of statistical tests of independence
dc.creator.none.fl_str_mv Schlüter, Federico
Bromberg, Facundo
Pérez, Diego Sebastián
author Schlüter, Federico
author_facet Schlüter, Federico
Bromberg, Facundo
Pérez, Diego Sebastián
author_role author
author2 Bromberg, Facundo
Pérez, Diego Sebastián
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
topic Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning
dc.description.none.fl_txt_mv A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.
Sociedad Argentina de Informática e Investigación Operativa
description A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.
publishDate 2010
dc.date.none.fl_str_mv 2010
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/152584
url http://sedici.unlp.edu.ar/handle/10915/152584
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf
info:eu-repo/semantics/altIdentifier/issn/1850-2784
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
48-59
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616267603902464
score 13.070432