Robust estimation of multivariate location and scatter in the presence of missing data
- Autores
- Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio
- Año de publicación
- 2012
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online.
Fil: Danilov, Mike. Google; Estados Unidos
Fil: Yohai, Victor Jaime. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina
Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá - Materia
-
Consistent
Elliptical Distribution
Em Algorithm
Fixed Point Equation - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/68375
Ver los metadatos del registro completo
id |
CONICETDig_d5e8b2a572abaae5dc53cb1f36535858 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/68375 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Robust estimation of multivariate location and scatter in the presence of missing dataDanilov, MikeYohai, Victor JaimeZamar, Ruben HoracioConsistentElliptical DistributionEm AlgorithmFixed Point Equationhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online.Fil: Danilov, Mike. Google; Estados UnidosFil: Yohai, Victor Jaime. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; ArgentinaFil: Zamar, Ruben Horacio. University of British Columbia; CanadáAmerican Statistical Association2012-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/68375Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio; Robust estimation of multivariate location and scatter in the presence of missing data; American Statistical Association; Journal of The American Statistical Association; 107; 499; 9-2012; 1178-11860162-1459CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1080/01621459.2012.699792info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/01621459.2012.699792info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:59:55Zoai:ri.conicet.gov.ar:11336/68375instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:59:55.851CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Robust estimation of multivariate location and scatter in the presence of missing data |
title |
Robust estimation of multivariate location and scatter in the presence of missing data |
spellingShingle |
Robust estimation of multivariate location and scatter in the presence of missing data Danilov, Mike Consistent Elliptical Distribution Em Algorithm Fixed Point Equation |
title_short |
Robust estimation of multivariate location and scatter in the presence of missing data |
title_full |
Robust estimation of multivariate location and scatter in the presence of missing data |
title_fullStr |
Robust estimation of multivariate location and scatter in the presence of missing data |
title_full_unstemmed |
Robust estimation of multivariate location and scatter in the presence of missing data |
title_sort |
Robust estimation of multivariate location and scatter in the presence of missing data |
dc.creator.none.fl_str_mv |
Danilov, Mike Yohai, Victor Jaime Zamar, Ruben Horacio |
author |
Danilov, Mike |
author_facet |
Danilov, Mike Yohai, Victor Jaime Zamar, Ruben Horacio |
author_role |
author |
author2 |
Yohai, Victor Jaime Zamar, Ruben Horacio |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Consistent Elliptical Distribution Em Algorithm Fixed Point Equation |
topic |
Consistent Elliptical Distribution Em Algorithm Fixed Point Equation |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.1 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online. Fil: Danilov, Mike. Google; Estados Unidos Fil: Yohai, Victor Jaime. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina Fil: Zamar, Ruben Horacio. University of British Columbia; Canadá |
description |
Two main issues regarding data quality are data contamination (outliers) and data completion (missing data). These two problems have attracted much attention and research but surprisingly, they are seldom considered together. Popular robust methods such as S-estimators of multivariate location and scatter offer protection against outliers but cannot deal with missing data, except for the obviously inefficient approach of deleting all incomplete cases. We generalize the definition of S-estimators of multivariate location and scatter to simultaneously deal with missing data and outliers. We show that the proposed estimators are strongly consistent under elliptical models when data are missing completely at random. We derive an algorithm similar to the Expectation-Maximization algorithm for computing the proposed estimators. This algorithm is initialized by an extension for missing data of the minimum volume ellipsoid. We assess the performance of our proposal by Monte Carlo simulation and give some real data examples. This article has supplementary material online. |
publishDate |
2012 |
dc.date.none.fl_str_mv |
2012-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/68375 Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio; Robust estimation of multivariate location and scatter in the presence of missing data; American Statistical Association; Journal of The American Statistical Association; 107; 499; 9-2012; 1178-1186 0162-1459 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/68375 |
identifier_str_mv |
Danilov, Mike; Yohai, Victor Jaime; Zamar, Ruben Horacio; Robust estimation of multivariate location and scatter in the presence of missing data; American Statistical Association; Journal of The American Statistical Association; 107; 499; 9-2012; 1178-1186 0162-1459 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1080/01621459.2012.699792 info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/01621459.2012.699792 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
American Statistical Association |
publisher.none.fl_str_mv |
American Statistical Association |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269610186899456 |
score |
13.13397 |