Robust estimation of multivariate location and scatter in the presence of missing data

Autores
Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio
Año de publicación
2012
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online.
Fil: Danilov, Mike. Google; Estados Unidos
Fil: Yohai, Victor Jaime. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina
Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá
Materia
Consistent
Elliptical Distribution
Em Algorithm
Fixed Point Equation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/68375

id CONICETDig_d5e8b2a572abaae5dc53cb1f36535858
oai_identifier_str oai:ri.conicet.gov.ar:11336/68375
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Robust estimation of multivariate location and scatter in the presence of missing dataDanilov, MikeYohai, Victor JaimeZamar, Ruben HoracioConsistentElliptical DistributionEm AlgorithmFixed Point Equationhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online.Fil: Danilov, Mike. Google; Estados UnidosFil: Yohai, Victor Jaime. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; ArgentinaFil: Zamar, Ruben Horacio. University of British Columbia; CanadáAmerican Statistical Association2012-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/68375Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio; Robust estimation of multivariate location and scatter in the presence of missing data; American Statistical Association; Journal of The American Statistical Association; 107; 499; 9-2012; 1178-11860162-1459CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1080/01621459.2012.699792info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/01621459.2012.699792info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:59:55Zoai:ri.conicet.gov.ar:11336/68375instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:59:55.851CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Robust estimation of multivariate location and scatter in the presence of missing data
title Robust estimation of multivariate location and scatter in the presence of missing data
spellingShingle Robust estimation of multivariate location and scatter in the presence of missing data
Danilov, Mike
Consistent
Elliptical Distribution
Em Algorithm
Fixed Point Equation
title_short Robust estimation of multivariate location and scatter in the presence of missing data
title_full Robust estimation of multivariate location and scatter in the presence of missing data
title_fullStr Robust estimation of multivariate location and scatter in the presence of missing data
title_full_unstemmed Robust estimation of multivariate location and scatter in the presence of missing data
title_sort Robust estimation of multivariate location and scatter in the presence of missing data
dc.creator.none.fl_str_mv Danilov, Mike
Yohai, Victor Jaime
Zamar, Ruben Horacio
author Danilov, Mike
author_facet Danilov, Mike
Yohai, Victor Jaime
Zamar, Ruben Horacio
author_role author
author2 Yohai, Victor Jaime
Zamar, Ruben Horacio
author2_role author
author
dc.subject.none.fl_str_mv Consistent
Elliptical Distribution
Em Algorithm
Fixed Point Equation
topic Consistent
Elliptical Distribution
Em Algorithm
Fixed Point Equation
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.1
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online.
Fil: Danilov, Mike. Google; Estados Unidos
Fil: Yohai, Victor Jaime. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina
Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá
description Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online.
publishDate 2012
dc.date.none.fl_str_mv 2012-09
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/68375
Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio; Robust estimation of multivariate location and scatter in the presence of missing data; American Statistical Association; Journal of The American Statistical Association; 107; 499; 9-2012; 1178-1186
0162-1459
CONICET Digital
CONICET
url http://hdl.handle.net/11336/68375
identifier_str_mv Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio; Robust estimation of multivariate location and scatter in the presence of missing data; American Statistical Association; Journal of The American Statistical Association; 107; 499; 9-2012; 1178-1186
0162-1459
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1080/01621459.2012.699792
info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/01621459.2012.699792
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv American Statistical Association
publisher.none.fl_str_mv American Statistical Association
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269610186899456
score 13.13397