Detecting influential observations in principal components and common principal components

Autores
Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.
Año de publicación
2010
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.
Fil: Boente Boente, Graciela Lina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigaciones Matemáticas "Luis A. Santaló"; Argentina
Fil: Pires, Ana M.. Technical University of Lisbon; Portugal
Fil: Rodrigues, Isabel M.. Technical University of Lisbon; Portugal
Materia
Common Principal Components
Detection of Outliers
Influence Functions
Robust Estimation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/15025

id CONICETDig_96a886945eced39ef7b4e528b17eadbe
oai_identifier_str oai:ri.conicet.gov.ar:11336/15025
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Detecting influential observations in principal components and common principal componentsBoente Boente, Graciela LinaPires, Ana M.Rodrigues, Isabel M.Common Principal ComponentsDetection of OutliersInfluence FunctionsRobust Estimationhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.Fil: Boente Boente, Graciela Lina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigaciones Matemáticas "Luis A. Santaló"; ArgentinaFil: Pires, Ana M.. Technical University of Lisbon; PortugalFil: Rodrigues, Isabel M.. Technical University of Lisbon; PortugalElsevier Science2010-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/15025Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.; Detecting influential observations in principal components and common principal components; Elsevier Science; Computational Statistics And Data Analysis; 54; 12; 12-2010; 2967-29750167-9473enginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167947310000022info:eu-repo/semantics/altIdentifier/doi/10.1016/j.csda.2010.01.001info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:12:24Zoai:ri.conicet.gov.ar:11336/15025instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:12:24.423CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Detecting influential observations in principal components and common principal components
title Detecting influential observations in principal components and common principal components
spellingShingle Detecting influential observations in principal components and common principal components
Boente Boente, Graciela Lina
Common Principal Components
Detection of Outliers
Influence Functions
Robust Estimation
title_short Detecting influential observations in principal components and common principal components
title_full Detecting influential observations in principal components and common principal components
title_fullStr Detecting influential observations in principal components and common principal components
title_full_unstemmed Detecting influential observations in principal components and common principal components
title_sort Detecting influential observations in principal components and common principal components
dc.creator.none.fl_str_mv Boente Boente, Graciela Lina
Pires, Ana M.
Rodrigues, Isabel M.
author Boente Boente, Graciela Lina
author_facet Boente Boente, Graciela Lina
Pires, Ana M.
Rodrigues, Isabel M.
author_role author
author2 Pires, Ana M.
Rodrigues, Isabel M.
author2_role author
author
dc.subject.none.fl_str_mv Common Principal Components
Detection of Outliers
Influence Functions
Robust Estimation
topic Common Principal Components
Detection of Outliers
Influence Functions
Robust Estimation
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.1
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.
Fil: Boente Boente, Graciela Lina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigaciones Matemáticas "Luis A. Santaló"; Argentina
Fil: Pires, Ana M.. Technical University of Lisbon; Portugal
Fil: Rodrigues, Isabel M.. Technical University of Lisbon; Portugal
description Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed.
publishDate 2010
dc.date.none.fl_str_mv 2010-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/15025
Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.; Detecting influential observations in principal components and common principal components; Elsevier Science; Computational Statistics And Data Analysis; 54; 12; 12-2010; 2967-2975
0167-9473
url http://hdl.handle.net/11336/15025
identifier_str_mv Boente Boente, Graciela Lina; Pires, Ana M.; Rodrigues, Isabel M.; Detecting influential observations in principal components and common principal components; Elsevier Science; Computational Statistics And Data Analysis; 54; 12; 12-2010; 2967-2975
0167-9473
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167947310000022
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.csda.2010.01.001
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science
publisher.none.fl_str_mv Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614031144386560
score 13.070432