Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques

Autores: Guerra, Jorge; Catania, Carlos
Año de publicación: 2017
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: The problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to help in the identification of suspicious connections as soon as possible. However, the lack of high-quality datasets has been one of the main obstacles in the developing of reliable intelligence-based tools. A well-labeled dataset is fundamental not only for the process of automatically learning models but also for testing its performance. Recently, RiskID emerged with the goal of providing to the network security community a collaborative tool for helping the labeling process. Through the use of visual and statistical techniques, RiskID facilitates to the user the generation of labeled datasets from real connections. In this article, we present a machine learning extension for RiskID, to help the user in the malware identification process. A preliminary study shows that as the size of labeled data increases, the use of machine learning models can be a valuable tool during the labeling process of future traffic connections.
VI Workshop de Seguridad Informática (WSI).
Red de Universidades con Carreras en Informática (RedUNCI)
Materia: Ciencias Informáticas
machine learning
dataset generation
network security
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/63933

Acceder

id	SEDICI_485da7c85ef1da18fc81df3ae0e115e2
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/63933
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning TechniquesGuerra, JorgeCatania, CarlosCiencias Informáticasmachine learningdataset generationnetwork securityThe problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to help in the identification of suspicious connections as soon as possible. However, the lack of high-quality datasets has been one of the main obstacles in the developing of reliable intelligence-based tools. A well-labeled dataset is fundamental not only for the process of automatically learning models but also for testing its performance. Recently, RiskID emerged with the goal of providing to the network security community a collaborative tool for helping the labeling process. Through the use of visual and statistical techniques, RiskID facilitates to the user the generation of labeled datasets from real connections. In this article, we present a machine learning extension for RiskID, to help the user in the malware identification process. A preliminary study shows that as the size of labeled data increases, the use of machine learning models can be a valuable tool during the labeling process of future traffic connections.VI Workshop de Seguridad Informática (WSI).Red de Universidades con Carreras en Informática (RedUNCI)2017-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf1269-1278http://sedici.unlp.edu.ar/handle/10915/63933enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-1539-9info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-11-26T09:47:06Zoai:sedici.unlp.edu.ar:10915/63933Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-11-26 09:47:06.382SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
spellingShingle	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques Guerra, Jorge Ciencias Informáticas machine learning dataset generation network security
title_short	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_full	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_fullStr	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_full_unstemmed	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_sort	Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
dc.creator.none.fl_str_mv	Guerra, Jorge Catania, Carlos
author	Guerra, Jorge
author_facet	Guerra, Jorge Catania, Carlos
author_role	author
author2	Catania, Carlos
author2_role	author
dc.subject.none.fl_str_mv	Ciencias Informáticas machine learning dataset generation network security
topic	Ciencias Informáticas machine learning dataset generation network security
dc.description.none.fl_txt_mv	The problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to help in the identification of suspicious connections as soon as possible. However, the lack of high-quality datasets has been one of the main obstacles in the developing of reliable intelligence-based tools. A well-labeled dataset is fundamental not only for the process of automatically learning models but also for testing its performance. Recently, RiskID emerged with the goal of providing to the network security community a collaborative tool for helping the labeling process. Through the use of visual and statistical techniques, RiskID facilitates to the user the generation of labeled datasets from real connections. In this article, we present a machine learning extension for RiskID, to help the user in the malware identification process. A preliminary study shows that as the size of labeled data increases, the use of machine learning models can be a valuable tool during the labeling process of future traffic connections. VI Workshop de Seguridad Informática (WSI). Red de Universidades con Carreras en Informática (RedUNCI)
description	The problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to help in the identification of suspicious connections as soon as possible. However, the lack of high-quality datasets has been one of the main obstacles in the developing of reliable intelligence-based tools. A well-labeled dataset is fundamental not only for the process of automatically learning models but also for testing its performance. Recently, RiskID emerged with the goal of providing to the network security community a collaborative tool for helping the labeling process. Through the use of visual and statistical techniques, RiskID facilitates to the user the generation of labeled datasets from real connections. In this article, we present a machine learning extension for RiskID, to help the user in the malware identification process. A preliminary study shows that as the size of labeled data increases, the use of machine learning models can be a valuable tool during the labeling process of future traffic connections.
publishDate	2017
dc.date.none.fl_str_mv	2017-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/63933
url	http://sedici.unlp.edu.ar/handle/10915/63933
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/978-950-34-1539-9
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 1269-1278
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1849875895338139648
score	13.011256

Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques

Publicaciones similares