Learning to detect spam messages

Autores: Gil Costa, Graciela Verónica; Errecalde, Marcelo Luis; Taranilla, María Teresa
Año de publicación: 2005
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested. Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts.
VI Workshop de Agentes y Sistemas Inteligentes (WASI)
Red de Universidades con Carreras en Informática (RedUNCI)
Materia: Ciencias Informáticas
Electronic mail
Message sending
Information filtering
spam
anti-spam filtering
automated text categorization
machine learning
k-NN
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/22957

Acceder

id	SEDICI_8374196e499423a40ae9f495651304a9
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/22957
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Learning to detect spam messagesGil Costa, Graciela VerónicaErrecalde, Marcelo LuisTaranilla, María TeresaCiencias InformáticasElectronic mailMessage sendingInformation filteringspamanti-spam filteringautomated text categorizationmachine learningk-NNThe problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested. Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts.VI Workshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI)2005-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/22957enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-22T16:36:47Zoai:sedici.unlp.edu.ar:10915/22957Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-22 16:36:47.533SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Learning to detect spam messages
title	Learning to detect spam messages
spellingShingle	Learning to detect spam messages Gil Costa, Graciela Verónica Ciencias Informáticas Electronic mail Message sending Information filtering spam anti-spam filtering automated text categorization machine learning k-NN
title_short	Learning to detect spam messages
title_full	Learning to detect spam messages
title_fullStr	Learning to detect spam messages
title_full_unstemmed	Learning to detect spam messages
title_sort	Learning to detect spam messages
dc.creator.none.fl_str_mv	Gil Costa, Graciela Verónica Errecalde, Marcelo Luis Taranilla, María Teresa
author	Gil Costa, Graciela Verónica
author_facet	Gil Costa, Graciela Verónica Errecalde, Marcelo Luis Taranilla, María Teresa
author_role	author
author2	Errecalde, Marcelo Luis Taranilla, María Teresa
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Electronic mail Message sending Information filtering spam anti-spam filtering automated text categorization machine learning k-NN
topic	Ciencias Informáticas Electronic mail Message sending Information filtering spam anti-spam filtering automated text categorization machine learning k-NN
dc.description.none.fl_txt_mv	The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested. Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts. VI Workshop de Agentes y Sistemas Inteligentes (WASI) Red de Universidades con Carreras en Informática (RedUNCI)
description	The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested. Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts.
publishDate	2005
dc.date.none.fl_str_mv	2005-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/22957
url	http://sedici.unlp.edu.ar/handle/10915/22957
dc.language.none.fl_str_mv	eng
language	eng
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1846782825665134592
score	12.982451

Learning to detect spam messages

Publicaciones similares