On semi-supervised learning

Autores
Cholaquidis, A.; Fraiman, R.; Sued, Raquel Mariela
Año de publicación
2020
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.
Fil: Cholaquidis, A.. Universidad de la República; Uruguay
Fil: Fraiman, R.. Universidad de la República; Uruguay
Fil: Sued, Raquel Mariela. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia
CONSISTENCY
SEMI-SUPERVISED LEARNING
SMALL TRAINING SAMPLE
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/147485

id CONICETDig_cfcb6a18a0fb0c7e075a2a2d78221b41
oai_identifier_str oai:ri.conicet.gov.ar:11336/147485
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling On semi-supervised learningCholaquidis, A.Fraiman, R.Sued, Raquel MarielaCONSISTENCYSEMI-SUPERVISED LEARNINGSMALL TRAINING SAMPLEhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.Fil: Cholaquidis, A.. Universidad de la República; UruguayFil: Fraiman, R.. Universidad de la República; UruguayFil: Sued, Raquel Mariela. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaSpringer2020-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/147485Cholaquidis, A.; Fraiman, R.; Sued, Raquel Mariela; On semi-supervised learning; Springer; Test; 29; 4; 12-2020; 914-9371133-0686CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1007/s11749-019-00690-2info:eu-repo/semantics/altIdentifier/url/https://link.springer.com/article/10.1007%2Fs11749-019-00690-2info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/1805.09180info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:39:27Zoai:ri.conicet.gov.ar:11336/147485instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:39:27.605CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv On semi-supervised learning
title On semi-supervised learning
spellingShingle On semi-supervised learning
Cholaquidis, A.
CONSISTENCY
SEMI-SUPERVISED LEARNING
SMALL TRAINING SAMPLE
title_short On semi-supervised learning
title_full On semi-supervised learning
title_fullStr On semi-supervised learning
title_full_unstemmed On semi-supervised learning
title_sort On semi-supervised learning
dc.creator.none.fl_str_mv Cholaquidis, A.
Fraiman, R.
Sued, Raquel Mariela
author Cholaquidis, A.
author_facet Cholaquidis, A.
Fraiman, R.
Sued, Raquel Mariela
author_role author
author2 Fraiman, R.
Sued, Raquel Mariela
author2_role author
author
dc.subject.none.fl_str_mv CONSISTENCY
SEMI-SUPERVISED LEARNING
SMALL TRAINING SAMPLE
topic CONSISTENCY
SEMI-SUPERVISED LEARNING
SMALL TRAINING SAMPLE
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.1
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.
Fil: Cholaquidis, A.. Universidad de la República; Uruguay
Fil: Fraiman, R.. Universidad de la República; Uruguay
Fil: Sued, Raquel Mariela. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.
publishDate 2020
dc.date.none.fl_str_mv 2020-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/147485
Cholaquidis, A.; Fraiman, R.; Sued, Raquel Mariela; On semi-supervised learning; Springer; Test; 29; 4; 12-2020; 914-937
1133-0686
CONICET Digital
CONICET
url http://hdl.handle.net/11336/147485
identifier_str_mv Cholaquidis, A.; Fraiman, R.; Sued, Raquel Mariela; On semi-supervised learning; Springer; Test; 29; 4; 12-2020; 914-937
1133-0686
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1007/s11749-019-00690-2
info:eu-repo/semantics/altIdentifier/url/https://link.springer.com/article/10.1007%2Fs11749-019-00690-2
info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/1805.09180
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614419779158016
score 13.070432