Speeding up the execution of a large number of statistical tests of independence
- Autores
- Schlüter, Federico; Bromberg, Facundo; Pérez, Diego Sebastián
- Año de publicación
- 2010
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
statistical tests of independence
contingency tables
probabilistic graphical models
structure learning - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/152584
Ver los metadatos del registro completo
id |
SEDICI_25d70904355930866078ae3af18370ce |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/152584 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Speeding up the execution of a large number of statistical tests of independenceSchlüter, FedericoBromberg, FacundoPérez, Diego SebastiánCiencias Informáticasstatistical tests of independencecontingency tablesprobabilistic graphical modelsstructure learningA massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.Sociedad Argentina de Informática e Investigación Operativa2010info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf48-59http://sedici.unlp.edu.ar/handle/10915/152584enginfo:eu-repo/semantics/altIdentifier/url/http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdfinfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:39:21Zoai:sedici.unlp.edu.ar:10915/152584Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:39:21.754SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Speeding up the execution of a large number of statistical tests of independence |
title |
Speeding up the execution of a large number of statistical tests of independence |
spellingShingle |
Speeding up the execution of a large number of statistical tests of independence Schlüter, Federico Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning |
title_short |
Speeding up the execution of a large number of statistical tests of independence |
title_full |
Speeding up the execution of a large number of statistical tests of independence |
title_fullStr |
Speeding up the execution of a large number of statistical tests of independence |
title_full_unstemmed |
Speeding up the execution of a large number of statistical tests of independence |
title_sort |
Speeding up the execution of a large number of statistical tests of independence |
dc.creator.none.fl_str_mv |
Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián |
author |
Schlüter, Federico |
author_facet |
Schlüter, Federico Bromberg, Facundo Pérez, Diego Sebastián |
author_role |
author |
author2 |
Bromberg, Facundo Pérez, Diego Sebastián |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning |
topic |
Ciencias Informáticas statistical tests of independence contingency tables probabilistic graphical models structure learning |
dc.description.none.fl_txt_mv |
A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints. Sociedad Argentina de Informática e Investigación Operativa |
description |
A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints. |
publishDate |
2010 |
dc.date.none.fl_str_mv |
2010 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/152584 |
url |
http://sedici.unlp.edu.ar/handle/10915/152584 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-05.pdf info:eu-repo/semantics/altIdentifier/issn/1850-2784 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 48-59 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616267603902464 |
score |
13.070432 |