Combining semi-supervised and active learning to recognize minority senses in a new corpus

Autores
Cardellino, Cristian Adrián; Teruel, Milagro; Alonso i Alemany, Laura
Año de publicación
2015
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Otras Ciencias de la Computación e Información
Materia
Natural language processing
Active learning
Semi-supervised learning
Nivel de accesibilidad
acceso abierto
Condiciones de uso
Repositorio
Repositorio Digital Universitario (UNC)
Institución
Universidad Nacional de Córdoba
OAI Identificador
oai:rdu.unc.edu.ar:11086/22132

id RDUUNC_4e8e104f7022932d76730ed865faa184
oai_identifier_str oai:rdu.unc.edu.ar:11086/22132
network_acronym_str RDUUNC
repository_id_str 2572
network_name_str Repositorio Digital Universitario (UNC)
spelling Combining semi-supervised and active learning to recognize minority senses in a new corpusCardellino, Cristian AdriánTeruel, MilagroAlonso i Alemany, LauraNatural language processingActive learningSemi-supervised learningPonencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Otras Ciencias de la Computación e Información2015info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://hdl.handle.net/11086/22132enginfo:eu-repo/semantics/openAccessreponame:Repositorio Digital Universitario (UNC)instname:Universidad Nacional de Córdobainstacron:UNC2025-09-29T13:44:26Zoai:rdu.unc.edu.ar:11086/22132Institucionalhttps://rdu.unc.edu.ar/Universidad públicaNo correspondehttp://rdu.unc.edu.ar/oai/snrdoca.unc@gmail.comArgentinaNo correspondeNo correspondeNo correspondeopendoar:25722025-09-29 13:44:26.603Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdobafalse
dc.title.none.fl_str_mv Combining semi-supervised and active learning to recognize minority senses in a new corpus
title Combining semi-supervised and active learning to recognize minority senses in a new corpus
spellingShingle Combining semi-supervised and active learning to recognize minority senses in a new corpus
Cardellino, Cristian Adrián
Natural language processing
Active learning
Semi-supervised learning
title_short Combining semi-supervised and active learning to recognize minority senses in a new corpus
title_full Combining semi-supervised and active learning to recognize minority senses in a new corpus
title_fullStr Combining semi-supervised and active learning to recognize minority senses in a new corpus
title_full_unstemmed Combining semi-supervised and active learning to recognize minority senses in a new corpus
title_sort Combining semi-supervised and active learning to recognize minority senses in a new corpus
dc.creator.none.fl_str_mv Cardellino, Cristian Adrián
Teruel, Milagro
Alonso i Alemany, Laura
author Cardellino, Cristian Adrián
author_facet Cardellino, Cristian Adrián
Teruel, Milagro
Alonso i Alemany, Laura
author_role author
author2 Teruel, Milagro
Alonso i Alemany, Laura
author2_role author
author
dc.subject.none.fl_str_mv Natural language processing
Active learning
Semi-supervised learning
topic Natural language processing
Active learning
Semi-supervised learning
dc.description.none.fl_txt_mv Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Otras Ciencias de la Computación e Información
description Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.
publishDate 2015
dc.date.none.fl_str_mv 2015
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11086/22132
url http://hdl.handle.net/11086/22132
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositorio Digital Universitario (UNC)
instname:Universidad Nacional de Córdoba
instacron:UNC
reponame_str Repositorio Digital Universitario (UNC)
collection Repositorio Digital Universitario (UNC)
instname_str Universidad Nacional de Córdoba
instacron_str UNC
institution UNC
repository.name.fl_str_mv Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdoba
repository.mail.fl_str_mv oca.unc@gmail.com
_version_ 1844618982186811392
score 13.070432