Combining semi-supervised and active learning to recognize minority senses in a new corpus
- Autores
- Cardellino, Cristian Adrián; Teruel, Milagro; Alonso i Alemany, Laura
- Año de publicación
- 2015
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Otras Ciencias de la Computación e Información - Materia
-
Natural language processing
Active learning
Semi-supervised learning - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- Repositorio
- Institución
- Universidad Nacional de Córdoba
- OAI Identificador
- oai:rdu.unc.edu.ar:11086/22132
Ver los metadatos del registro completo
id |
RDUUNC_4e8e104f7022932d76730ed865faa184 |
---|---|
oai_identifier_str |
oai:rdu.unc.edu.ar:11086/22132 |
network_acronym_str |
RDUUNC |
repository_id_str |
2572 |
network_name_str |
Repositorio Digital Universitario (UNC) |
spelling |
Combining semi-supervised and active learning to recognize minority senses in a new corpusCardellino, Cristian AdriánTeruel, MilagroAlonso i Alemany, LauraNatural language processingActive learningSemi-supervised learningPonencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.Otras Ciencias de la Computación e Información2015info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://hdl.handle.net/11086/22132enginfo:eu-repo/semantics/openAccessreponame:Repositorio Digital Universitario (UNC)instname:Universidad Nacional de Córdobainstacron:UNC2025-09-29T13:44:26Zoai:rdu.unc.edu.ar:11086/22132Institucionalhttps://rdu.unc.edu.ar/Universidad públicaNo correspondehttp://rdu.unc.edu.ar/oai/snrdoca.unc@gmail.comArgentinaNo correspondeNo correspondeNo correspondeopendoar:25722025-09-29 13:44:26.603Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdobafalse |
dc.title.none.fl_str_mv |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
title |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
spellingShingle |
Combining semi-supervised and active learning to recognize minority senses in a new corpus Cardellino, Cristian Adrián Natural language processing Active learning Semi-supervised learning |
title_short |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
title_full |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
title_fullStr |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
title_full_unstemmed |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
title_sort |
Combining semi-supervised and active learning to recognize minority senses in a new corpus |
dc.creator.none.fl_str_mv |
Cardellino, Cristian Adrián Teruel, Milagro Alonso i Alemany, Laura |
author |
Cardellino, Cristian Adrián |
author_facet |
Cardellino, Cristian Adrián Teruel, Milagro Alonso i Alemany, Laura |
author_role |
author |
author2 |
Teruel, Milagro Alonso i Alemany, Laura |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Natural language processing Active learning Semi-supervised learning |
topic |
Natural language processing Active learning Semi-supervised learning |
dc.description.none.fl_txt_mv |
Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015. Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus. Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Teruel, Milagro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Otras Ciencias de la Computación e Información |
description |
Ponencia presentada en la 24th International Joint Conference on Artificial Intelligence. Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software. Buenos Aires, Argentina, del 25 al 31 de julio de 2015. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11086/22132 |
url |
http://hdl.handle.net/11086/22132 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositorio Digital Universitario (UNC) instname:Universidad Nacional de Córdoba instacron:UNC |
reponame_str |
Repositorio Digital Universitario (UNC) |
collection |
Repositorio Digital Universitario (UNC) |
instname_str |
Universidad Nacional de Córdoba |
instacron_str |
UNC |
institution |
UNC |
repository.name.fl_str_mv |
Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdoba |
repository.mail.fl_str_mv |
oca.unc@gmail.com |
_version_ |
1844618982186811392 |
score |
13.070432 |