Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

Autores: Rodríguez, Martín; Rossi, Gustavo Héctor; Fernández, Alejandro
Año de publicación: 2025
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
Materia: Ciencias de la Computación e Información
Evaluation
Unit Testing
LLM
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-nd/4.0/
Repositorio
Institución: Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
OAI Identificador: oai:digital.cic.gba.gob.ar:11746/12673

Acceder

id	CICBA_f0ffe3e09fe08940eae6cc8467676b0f
oai_identifier_str	oai:digital.cic.gba.gob.ar:11746/12673
network_acronym_str	CICBA
repository_id_str	9441
network_name_str	CIC Digital (CICBA)
spelling	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary ValuesRodríguez, MartínRossi, Gustavo HéctorFernández, AlejandroCiencias de la Computación e InformaciónEvaluationUnit TestingLLMThe design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.2025info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/12673enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2026-05-28T08:42:07Zoai:digital.cic.gba.gob.ar:11746/12673Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412026-05-28 08:42:07.541CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse
dc.title.none.fl_str_mv	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
spellingShingle	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values Rodríguez, Martín Ciencias de la Computación e Información Evaluation Unit Testing LLM
title_short	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_full	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_fullStr	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_full_unstemmed	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_sort	Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
dc.creator.none.fl_str_mv	Rodríguez, Martín Rossi, Gustavo Héctor Fernández, Alejandro
author	Rodríguez, Martín
author_facet	Rodríguez, Martín Rossi, Gustavo Héctor Fernández, Alejandro
author_role	author
author2	Rossi, Gustavo Héctor Fernández, Alejandro
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias de la Computación e Información Evaluation Unit Testing LLM
topic	Ciencias de la Computación e Información Evaluation Unit Testing LLM
dc.description.none.fl_txt_mv	The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
description	The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
publishDate	2025
dc.date.none.fl_str_mv	2025
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	https://digital.cic.gba.gob.ar/handle/11746/12673
url	https://digital.cic.gba.gob.ar/handle/11746/12673
dc.language.none.fl_str_mv	eng
language	eng
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:CIC Digital (CICBA) instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires instacron:CICBA
reponame_str	CIC Digital (CICBA)
collection	CIC Digital (CICBA)
instname_str	Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron_str	CICBA
institution	CICBA
repository.name.fl_str_mv	CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
repository.mail.fl_str_mv	marisa.degiusti@sedici.unlp.edu.ar
_version_	1866437143635689472
score	12.632593

Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

Publicaciones similares