Multivariate location and scatter matrix estimation under cellwise and casewise contamination

Autores
Leung, Andy; Yohai, Victor Jaime; Zamar, Ruben Horacio
Año de publicación
2017
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination.
Fil: Leung, Andy. University of British Columbia; Canadá
Fil: Yohai, Victor Jaime. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Matemática; Argentina
Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá
Materia
Cellwise Outliers
Componentwise Contamination
Multivariate Location And Scatter
Robust Estimation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/66009

id CONICETDig_a65e885b38690ba4bdb3723535aa6bf9
oai_identifier_str oai:ri.conicet.gov.ar:11336/66009
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Multivariate location and scatter matrix estimation under cellwise and casewise contaminationLeung, AndyYohai, Victor JaimeZamar, Ruben HoracioCellwise OutliersComponentwise ContaminationMultivariate Location And ScatterRobust Estimationhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination.Fil: Leung, Andy. University of British Columbia; CanadáFil: Yohai, Victor Jaime. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Matemática; ArgentinaFil: Zamar, Ruben Horacio. University of British Columbia; CanadáElsevier Science2017-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/66009Leung, Andy; Yohai, Victor Jaime; Zamar, Ruben Horacio; Multivariate location and scatter matrix estimation under cellwise and casewise contamination; Elsevier Science; Computational Statistics and Data Analysis; 111; 7-2017; 59-760167-9473CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.csda.2017.02.007info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0167947317300270info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T14:44:39Zoai:ri.conicet.gov.ar:11336/66009instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 14:44:40.039CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title Multivariate location and scatter matrix estimation under cellwise and casewise contamination
spellingShingle Multivariate location and scatter matrix estimation under cellwise and casewise contamination
Leung, Andy
Cellwise Outliers
Componentwise Contamination
Multivariate Location And Scatter
Robust Estimation
title_short Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_full Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_fullStr Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_full_unstemmed Multivariate location and scatter matrix estimation under cellwise and casewise contamination
title_sort Multivariate location and scatter matrix estimation under cellwise and casewise contamination
dc.creator.none.fl_str_mv Leung, Andy
Yohai, Victor Jaime
Zamar, Ruben Horacio
author Leung, Andy
author_facet Leung, Andy
Yohai, Victor Jaime
Zamar, Ruben Horacio
author_role author
author2 Yohai, Victor Jaime
Zamar, Ruben Horacio
author2_role author
author
dc.subject.none.fl_str_mv Cellwise Outliers
Componentwise Contamination
Multivariate Location And Scatter
Robust Estimation
topic Cellwise Outliers
Componentwise Contamination
Multivariate Location And Scatter
Robust Estimation
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.1
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination.
Fil: Leung, Andy. University of British Columbia; Canadá
Fil: Yohai, Victor Jaime. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Matemática; Argentina
Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá
description Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination.
publishDate 2017
dc.date.none.fl_str_mv 2017-07
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/66009
Leung, Andy; Yohai, Victor Jaime; Zamar, Ruben Horacio; Multivariate location and scatter matrix estimation under cellwise and casewise contamination; Elsevier Science; Computational Statistics and Data Analysis; 111; 7-2017; 59-76
0167-9473
CONICET Digital
CONICET
url http://hdl.handle.net/11336/66009
identifier_str_mv Leung, Andy; Yohai, Victor Jaime; Zamar, Ruben Horacio; Multivariate location and scatter matrix estimation under cellwise and casewise contamination; Elsevier Science; Computational Statistics and Data Analysis; 111; 7-2017; 59-76
0167-9473
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1016/j.csda.2017.02.007
info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0167947317300270
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science
publisher.none.fl_str_mv Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846082957535608832
score 13.22299