Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection

Autores
Catania, Carlos Adrián; García Garino, Carlos; Bromberg, Facundo
Año de publicación
2010
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Supervised learning classifiers have proved to be a viable solution in the network intrusion detection field. In practice, however, it is difficult to obtain the required labeled data for implementing these approaches. An alternative approach that avoids the need of labeled datasets consists of using classifiers following a semi-supervised strategy. These classifiers use in their learning process information from labeled and unlabeled datapoints. One of these semi-supervised approaches, originally applied to text classification, combines a naïve Bayes (NB) classifier with the expectation maximization (EM) algorithm. Despite some differences, network intrusion detection shares many of the characteristics of the document classification problem. It is extremely hard to obtain labeled data whereas there are plenty of unlabeled data easily accessible. This work aims to determine the viability of applying semi-supervised techniques to network intrusion detection, with special focus on the combination of NB classifier and EM. A set of experiments conducted on the 1998 DARPA dataset show using EM with unlabeled data can provide significant benefits in classification performance, reducing the size of required labeled data by 90%.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
Intrusion Detection Systems
Semi-supervised Learning
Expectation Maximization
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/152809

id SEDICI_08a0906716effd51dbcba2fdcc54d55a
oai_identifier_str oai:sedici.unlp.edu.ar:10915/152809
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion DetectionCatania, Carlos AdriánGarcía Garino, CarlosBromberg, FacundoCiencias InformáticasIntrusion Detection SystemsSemi-supervised LearningExpectation MaximizationSupervised learning classifiers have proved to be a viable solution in the network intrusion detection field. In practice, however, it is difficult to obtain the required labeled data for implementing these approaches. An alternative approach that avoids the need of labeled datasets consists of using classifiers following a semi-supervised strategy. These classifiers use in their learning process information from labeled and unlabeled datapoints. One of these semi-supervised approaches, originally applied to text classification, combines a naïve Bayes (NB) classifier with the expectation maximization (EM) algorithm. Despite some differences, network intrusion detection shares many of the characteristics of the document classification problem. It is extremely hard to obtain labeled data whereas there are plenty of unlabeled data easily accessible. This work aims to determine the viability of applying semi-supervised techniques to network intrusion detection, with special focus on the combination of NB classifier and EM. A set of experiments conducted on the 1998 DARPA dataset show using EM with unlabeled data can provide significant benefits in classification performance, reducing the size of required labeled data by 90%.Sociedad Argentina de Informática e Investigación Operativa2010info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf175-186http://sedici.unlp.edu.ar/handle/10915/152809enginfo:eu-repo/semantics/altIdentifier/url/http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-16.pdfinfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:39:21Zoai:sedici.unlp.edu.ar:10915/152809Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:39:21.789SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
title Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
spellingShingle Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
Catania, Carlos Adrián
Ciencias Informáticas
Intrusion Detection Systems
Semi-supervised Learning
Expectation Maximization
title_short Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
title_full Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
title_fullStr Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
title_full_unstemmed Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
title_sort Application of a Bayesian Semi-supervised Learning Strategy to Network Intrusion Detection
dc.creator.none.fl_str_mv Catania, Carlos Adrián
García Garino, Carlos
Bromberg, Facundo
author Catania, Carlos Adrián
author_facet Catania, Carlos Adrián
García Garino, Carlos
Bromberg, Facundo
author_role author
author2 García Garino, Carlos
Bromberg, Facundo
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Intrusion Detection Systems
Semi-supervised Learning
Expectation Maximization
topic Ciencias Informáticas
Intrusion Detection Systems
Semi-supervised Learning
Expectation Maximization
dc.description.none.fl_txt_mv Supervised learning classifiers have proved to be a viable solution in the network intrusion detection field. In practice, however, it is difficult to obtain the required labeled data for implementing these approaches. An alternative approach that avoids the need of labeled datasets consists of using classifiers following a semi-supervised strategy. These classifiers use in their learning process information from labeled and unlabeled datapoints. One of these semi-supervised approaches, originally applied to text classification, combines a naïve Bayes (NB) classifier with the expectation maximization (EM) algorithm. Despite some differences, network intrusion detection shares many of the characteristics of the document classification problem. It is extremely hard to obtain labeled data whereas there are plenty of unlabeled data easily accessible. This work aims to determine the viability of applying semi-supervised techniques to network intrusion detection, with special focus on the combination of NB classifier and EM. A set of experiments conducted on the 1998 DARPA dataset show using EM with unlabeled data can provide significant benefits in classification performance, reducing the size of required labeled data by 90%.
Sociedad Argentina de Informática e Investigación Operativa
description Supervised learning classifiers have proved to be a viable solution in the network intrusion detection field. In practice, however, it is difficult to obtain the required labeled data for implementing these approaches. An alternative approach that avoids the need of labeled datasets consists of using classifiers following a semi-supervised strategy. These classifiers use in their learning process information from labeled and unlabeled datapoints. One of these semi-supervised approaches, originally applied to text classification, combines a naïve Bayes (NB) classifier with the expectation maximization (EM) algorithm. Despite some differences, network intrusion detection shares many of the characteristics of the document classification problem. It is extremely hard to obtain labeled data whereas there are plenty of unlabeled data easily accessible. This work aims to determine the viability of applying semi-supervised techniques to network intrusion detection, with special focus on the combination of NB classifier and EM. A set of experiments conducted on the 1998 DARPA dataset show using EM with unlabeled data can provide significant benefits in classification performance, reducing the size of required labeled data by 90%.
publishDate 2010
dc.date.none.fl_str_mv 2010
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/152809
url http://sedici.unlp.edu.ar/handle/10915/152809
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://39jaiio.sadio.org.ar/sites/default/files/39jaiio-asai-16.pdf
info:eu-repo/semantics/altIdentifier/issn/1850-2784
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
175-186
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616267617533952
score 13.070432