On the class distribution labelling step sensitivity of co-training
- Autores
- Matsubara, Edson T.; Monard, Maria C.; Prati, Ronaldo
- Año de publicación
- 2006
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.
IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining
Red de Universidades con Carreras en Informática (RedUNCI) - Materia
-
Ciencias Informáticas
iterative algorithm
label
challenging domains - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/23900
Ver los metadatos del registro completo
id |
SEDICI_38c644f1cde1352030107ed59a7feaf9 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/23900 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
On the class distribution labelling step sensitivity of co-trainingMatsubara, Edson T.Monard, Maria C.Prati, RonaldoCiencias Informáticasiterative algorithmlabelchallenging domainsCo-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI)2006-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/23900enginfo:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:55:40Zoai:sedici.unlp.edu.ar:10915/23900Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:55:40.439SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
On the class distribution labelling step sensitivity of co-training |
title |
On the class distribution labelling step sensitivity of co-training |
spellingShingle |
On the class distribution labelling step sensitivity of co-training Matsubara, Edson T. Ciencias Informáticas iterative algorithm label challenging domains |
title_short |
On the class distribution labelling step sensitivity of co-training |
title_full |
On the class distribution labelling step sensitivity of co-training |
title_fullStr |
On the class distribution labelling step sensitivity of co-training |
title_full_unstemmed |
On the class distribution labelling step sensitivity of co-training |
title_sort |
On the class distribution labelling step sensitivity of co-training |
dc.creator.none.fl_str_mv |
Matsubara, Edson T. Monard, Maria C. Prati, Ronaldo |
author |
Matsubara, Edson T. |
author_facet |
Matsubara, Edson T. Monard, Maria C. Prati, Ronaldo |
author_role |
author |
author2 |
Monard, Maria C. Prati, Ronaldo |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas iterative algorithm label challenging domains |
topic |
Ciencias Informáticas iterative algorithm label challenging domains |
dc.description.none.fl_txt_mv |
Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains. IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining Red de Universidades con Carreras en Informática (RedUNCI) |
description |
Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains. |
publishDate |
2006 |
dc.date.none.fl_str_mv |
2006-08 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/23900 |
url |
http://sedici.unlp.edu.ar/handle/10915/23900 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5) |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615815781941248 |
score |
13.069144 |