Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?

Autores: Cabral, Juan Bautista; Ramos Almendares, Felipe Alberto; Gurovich, Sebastian; Granitto, Pablo Miguel
Año de publicación: 2020
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data.
Fil: Cabral, Juan Bautista. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
Fil: Ramos Almendares, Felipe Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; Argentina
Fil: Gurovich, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; Argentina
Fil: Granitto, Pablo Miguel. University of Western Sydney; Australia
Materia: CATALOGS
GALAXY: BULGE
METHODS: DATA ANALYSIS
METHODS: STATISTICAL
STARS: VARIABLES: RR LYRAE
SURVEYS
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/143182

Acceder

id	CONICETDig_e51697e36c679f9a0822533ed3a49e68
oai_identifier_str	oai:ri.conicet.gov.ar:11336/143182
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?Cabral, Juan BautistaRamos Almendares, Felipe AlbertoGurovich, SebastianGranitto, Pablo MiguelCATALOGSGALAXY: BULGEMETHODS: DATA ANALYSISMETHODS: STATISTICALSTARS: VARIABLES: RR LYRAESURVEYShttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data.Fil: Cabral, Juan Bautista. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Ramos Almendares, Felipe Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; ArgentinaFil: Gurovich, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; ArgentinaFil: Granitto, Pablo Miguel. University of Western Sydney; AustraliaEDP Sciences2020-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/143182Cabral, Juan Bautista; Ramos Almendares, Felipe Alberto; Gurovich, Sebastian; Granitto, Pablo Miguel; Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?; EDP Sciences; Astronomy and Astrophysics; 642; 10-2020; 1-380004-6361CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.aanda.org/10.1051/0004-6361/202038314info:eu-repo/semantics/altIdentifier/doi/10.1051/0004-6361/202038314info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/2005.00220info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T09:57:13Zoai:ri.conicet.gov.ar:11336/143182instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 09:57:13.534CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
title	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
spellingShingle	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning? Cabral, Juan Bautista CATALOGS GALAXY: BULGE METHODS: DATA ANALYSIS METHODS: STATISTICAL STARS: VARIABLES: RR LYRAE SURVEYS
title_short	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
title_full	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
title_fullStr	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
title_full_unstemmed	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
title_sort	Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?
dc.creator.none.fl_str_mv	Cabral, Juan Bautista Ramos Almendares, Felipe Alberto Gurovich, Sebastian Granitto, Pablo Miguel
author	Cabral, Juan Bautista
author_facet	Cabral, Juan Bautista Ramos Almendares, Felipe Alberto Gurovich, Sebastian Granitto, Pablo Miguel
author_role	author
author2	Ramos Almendares, Felipe Alberto Gurovich, Sebastian Granitto, Pablo Miguel
author2_role	author author author
dc.subject.none.fl_str_mv	CATALOGS GALAXY: BULGE METHODS: DATA ANALYSIS METHODS: STATISTICAL STARS: VARIABLES: RR LYRAE SURVEYS
topic	CATALOGS GALAXY: BULGE METHODS: DATA ANALYSIS METHODS: STATISTICAL STARS: VARIABLES: RR LYRAE SURVEYS
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.3 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data. Fil: Cabral, Juan Bautista. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina Fil: Ramos Almendares, Felipe Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; Argentina Fil: Gurovich, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Astronomía Teórica y Experimental. Universidad Nacional de Córdoba. Observatorio Astronómico de Córdoba. Instituto de Astronomía Teórica y Experimental; Argentina Fil: Granitto, Pablo Miguel. University of Western Sydney; Australia
description	Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data.
publishDate	2020
dc.date.none.fl_str_mv	2020-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/143182 Cabral, Juan Bautista; Ramos Almendares, Felipe Alberto; Gurovich, Sebastian; Granitto, Pablo Miguel; Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?; EDP Sciences; Astronomy and Astrophysics; 642; 10-2020; 1-38 0004-6361 CONICET Digital CONICET
url	http://hdl.handle.net/11336/143182
identifier_str_mv	Cabral, Juan Bautista; Ramos Almendares, Felipe Alberto; Gurovich, Sebastian; Granitto, Pablo Miguel; Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?; EDP Sciences; Astronomy and Astrophysics; 642; 10-2020; 1-38 0004-6361 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://www.aanda.org/10.1051/0004-6361/202038314 info:eu-repo/semantics/altIdentifier/doi/10.1051/0004-6361/202038314 info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/2005.00220
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf application/pdf application/pdf
dc.publisher.none.fl_str_mv	EDP Sciences
publisher.none.fl_str_mv	EDP Sciences
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1858304725096595456
score	13.176822

Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?

Publicaciones similares