Detecting influential observations in principal components and common principal components
- Autores
- Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.
- Año de publicación
- 2010
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.
Fil: Boente Boente, Graciela Lina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigaciones Matemáticas "Luis A. Santaló"; Argentina
Fil: Pires, Ana M.. Technical University of Lisbon; Portugal
Fil: Rodrigues, Isabel M.. Technical University of Lisbon; Portugal - Materia
-
Common Principal Components
Detection of Outliers
Influence Functions
Robust Estimation - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/15025
Ver los metadatos del registro completo
id |
CONICETDig_96a886945eced39ef7b4e528b17eadbe |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/15025 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Detecting influential observations in principal components and common principal componentsBoente Boente, Graciela LinaPires, Ana M.Rodrigues, Isabel M.Common Principal ComponentsDetection of OutliersInfluence FunctionsRobust Estimationhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.Fil: Boente Boente, Graciela Lina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigaciones Matemáticas "Luis A. Santaló"; ArgentinaFil: Pires, Ana M.. Technical University of Lisbon; PortugalFil: Rodrigues, Isabel M.. Technical University of Lisbon; PortugalElsevier Science2010-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/15025Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.; Detecting influential observations in principal components and common principal components; Elsevier Science; Computational Statistics And Data Analysis; 54; 12; 12-2010; 2967-29750167-9473enginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167947310000022info:eu-repo/semantics/altIdentifier/doi/10.1016/j.csda.2010.01.001info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:12:24Zoai:ri.conicet.gov.ar:11336/15025instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:12:24.423CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Detecting influential observations in principal components and common principal components |
title |
Detecting influential observations in principal components and common principal components |
spellingShingle |
Detecting influential observations in principal components and common principal components Boente Boente, Graciela Lina Common Principal Components Detection of Outliers Influence Functions Robust Estimation |
title_short |
Detecting influential observations in principal components and common principal components |
title_full |
Detecting influential observations in principal components and common principal components |
title_fullStr |
Detecting influential observations in principal components and common principal components |
title_full_unstemmed |
Detecting influential observations in principal components and common principal components |
title_sort |
Detecting influential observations in principal components and common principal components |
dc.creator.none.fl_str_mv |
Boente Boente, Graciela Lina Pires, Ana M. Rodrigues, Isabel M. |
author |
Boente Boente, Graciela Lina |
author_facet |
Boente Boente, Graciela Lina Pires, Ana M. Rodrigues, Isabel M. |
author_role |
author |
author2 |
Pires, Ana M. Rodrigues, Isabel M. |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Common Principal Components Detection of Outliers Influence Functions Robust Estimation |
topic |
Common Principal Components Detection of Outliers Influence Functions Robust Estimation |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.1 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed. Fil: Boente Boente, Graciela Lina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigaciones Matemáticas "Luis A. Santaló"; Argentina Fil: Pires, Ana M.. Technical University of Lisbon; Portugal Fil: Rodrigues, Isabel M.. Technical University of Lisbon; Portugal |
description |
Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed. |
publishDate |
2010 |
dc.date.none.fl_str_mv |
2010-12 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/15025 Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.; Detecting influential observations in principal components and common principal components; Elsevier Science; Computational Statistics And Data Analysis; 54; 12; 12-2010; 2967-2975 0167-9473 |
url |
http://hdl.handle.net/11336/15025 |
identifier_str_mv |
Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.; Detecting influential observations in principal components and common principal components; Elsevier Science; Computational Statistics And Data Analysis; 54; 12; 12-2010; 2967-2975 0167-9473 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167947310000022 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.csda.2010.01.001 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science |
publisher.none.fl_str_mv |
Elsevier Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614031144386560 |
score |
13.070432 |