On the class distribution labelling step sensitivity of co-training

Autores: Matsubara, Edson T.; Monard, Maria C.; Prati, Ronaldo
Año de publicación: 2006
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.
IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining
Red de Universidades con Carreras en Informática (RedUNCI)
Materia: Ciencias Informáticas
iterative algorithm
label
challenging domains
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/23900

Acceder

id	SEDICI_38c644f1cde1352030107ed59a7feaf9
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/23900
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	On the class distribution labelling step sensitivity of co-trainingMatsubara, Edson T.Monard, Maria C.Prati, RonaldoCiencias Informáticasiterative algorithmlabelchallenging domainsCo-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI)2006-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/23900enginfo:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-11-26T09:34:13Zoai:sedici.unlp.edu.ar:10915/23900Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-11-26 09:34:13.996SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	On the class distribution labelling step sensitivity of co-training
title	On the class distribution labelling step sensitivity of co-training
spellingShingle	On the class distribution labelling step sensitivity of co-training Matsubara, Edson T. Ciencias Informáticas iterative algorithm label challenging domains
title_short	On the class distribution labelling step sensitivity of co-training
title_full	On the class distribution labelling step sensitivity of co-training
title_fullStr	On the class distribution labelling step sensitivity of co-training
title_full_unstemmed	On the class distribution labelling step sensitivity of co-training
title_sort	On the class distribution labelling step sensitivity of co-training
dc.creator.none.fl_str_mv	Matsubara, Edson T. Monard, Maria C. Prati, Ronaldo
author	Matsubara, Edson T.
author_facet	Matsubara, Edson T. Monard, Maria C. Prati, Ronaldo
author_role	author
author2	Monard, Maria C. Prati, Ronaldo
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas iterative algorithm label challenging domains
topic	Ciencias Informáticas iterative algorithm label challenging domains
dc.description.none.fl_txt_mv	Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains. IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining Red de Universidades con Carreras en Informática (RedUNCI)
description	Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.
publishDate	2006
dc.date.none.fl_str_mv	2006-08
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/23900
url	http://sedici.unlp.edu.ar/handle/10915/23900
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1849875728207708160
score	13.011256

On the class distribution labelling step sensitivity of co-training

Publicaciones similares