On the class distribution labelling step sensitivity of co-training

Autores
Matsubara, Edson T.; Monard, Maria C.; Prati, Ronaldo
Año de publicación
2006
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.
IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining
Red de Universidades con Carreras en Informática (RedUNCI)
Materia
Ciencias Informáticas
iterative algorithm
label
challenging domains
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/23900

id SEDICI_38c644f1cde1352030107ed59a7feaf9
oai_identifier_str oai:sedici.unlp.edu.ar:10915/23900
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling On the class distribution labelling step sensitivity of co-trainingMatsubara, Edson T.Monard, Maria C.Prati, RonaldoCiencias Informáticasiterative algorithmlabelchallenging domainsCo-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI)2006-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/23900enginfo:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:55:40Zoai:sedici.unlp.edu.ar:10915/23900Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:55:40.439SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv On the class distribution labelling step sensitivity of co-training
title On the class distribution labelling step sensitivity of co-training
spellingShingle On the class distribution labelling step sensitivity of co-training
Matsubara, Edson T.
Ciencias Informáticas
iterative algorithm
label
challenging domains
title_short On the class distribution labelling step sensitivity of co-training
title_full On the class distribution labelling step sensitivity of co-training
title_fullStr On the class distribution labelling step sensitivity of co-training
title_full_unstemmed On the class distribution labelling step sensitivity of co-training
title_sort On the class distribution labelling step sensitivity of co-training
dc.creator.none.fl_str_mv Matsubara, Edson T.
Monard, Maria C.
Prati, Ronaldo
author Matsubara, Edson T.
author_facet Matsubara, Edson T.
Monard, Maria C.
Prati, Ronaldo
author_role author
author2 Monard, Maria C.
Prati, Ronaldo
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
iterative algorithm
label
challenging domains
topic Ciencias Informáticas
iterative algorithm
label
challenging domains
dc.description.none.fl_txt_mv Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.
IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining
Red de Universidades con Carreras en Informática (RedUNCI)
description Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.
publishDate 2006
dc.date.none.fl_str_mv 2006-08
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/23900
url http://sedici.unlp.edu.ar/handle/10915/23900
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615815781941248
score 13.069144