Comparison of SVM and some older classification algorithms in text classification tasks

Autores
Colas, Fabrice; Brazdil, Pavel
Año de publicación
2006
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.
IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining
Red de Universidades con Carreras en Informática (RedUNCI)
Materia
Ciencias Informáticas
gestión de documentos
support vector machine
Model classification
Algorithms
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/23885

id SEDICI_afe57df1e295b21d0676ec694e5314d3
oai_identifier_str oai:sedici.unlp.edu.ar:10915/23885
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Comparison of SVM and some older classification algorithms in text classification tasksColas, FabriceBrazdil, PavelCiencias Informáticasgestión de documentossupport vector machineModel classificationAlgorithmsDocument classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI)2006-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/23885enginfo:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-22T16:37:10Zoai:sedici.unlp.edu.ar:10915/23885Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-22 16:37:10.308SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Comparison of SVM and some older classification algorithms in text classification tasks
title Comparison of SVM and some older classification algorithms in text classification tasks
spellingShingle Comparison of SVM and some older classification algorithms in text classification tasks
Colas, Fabrice
Ciencias Informáticas
gestión de documentos
support vector machine
Model classification
Algorithms
title_short Comparison of SVM and some older classification algorithms in text classification tasks
title_full Comparison of SVM and some older classification algorithms in text classification tasks
title_fullStr Comparison of SVM and some older classification algorithms in text classification tasks
title_full_unstemmed Comparison of SVM and some older classification algorithms in text classification tasks
title_sort Comparison of SVM and some older classification algorithms in text classification tasks
dc.creator.none.fl_str_mv Colas, Fabrice
Brazdil, Pavel
author Colas, Fabrice
author_facet Colas, Fabrice
Brazdil, Pavel
author_role author
author2 Brazdil, Pavel
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
gestión de documentos
support vector machine
Model classification
Algorithms
topic Ciencias Informáticas
gestión de documentos
support vector machine
Model classification
Algorithms
dc.description.none.fl_txt_mv Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.
IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data Mining
Red de Universidades con Carreras en Informática (RedUNCI)
description Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.
publishDate 2006
dc.date.none.fl_str_mv 2006-08
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/23885
url http://sedici.unlp.edu.ar/handle/10915/23885
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/0-387-34654-6
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1846782832275357696
score 12.982451