Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values

Autores
Rodríguez, Martín; Rossi, Gustavo Héctor; Fernández, Alejandro
Año de publicación
2025
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
Materia
Ciencias de la Computación e Información
Evaluation
Unit Testing
LLM
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-nd/4.0/
Repositorio
CIC Digital (CICBA)
Institución
Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
OAI Identificador
oai:digital.cic.gba.gob.ar:11746/12673

id CICBA_f0ffe3e09fe08940eae6cc8467676b0f
oai_identifier_str oai:digital.cic.gba.gob.ar:11746/12673
network_acronym_str CICBA
repository_id_str 9441
network_name_str CIC Digital (CICBA)
spelling Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary ValuesRodríguez, MartínRossi, Gustavo HéctorFernández, AlejandroCiencias de la Computación e InformaciónEvaluationUnit TestingLLMThe design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.2025info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/12673enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2026-03-26T11:18:04Zoai:digital.cic.gba.gob.ar:11746/12673Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412026-03-26 11:18:05.039CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse
dc.title.none.fl_str_mv Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
spellingShingle Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
Rodríguez, Martín
Ciencias de la Computación e Información
Evaluation
Unit Testing
LLM
title_short Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_full Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_fullStr Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_full_unstemmed Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
title_sort Evaluating Large Language Models for the Generation of Unit Tests with Equivalence Partitions and Boundary Values
dc.creator.none.fl_str_mv Rodríguez, Martín
Rossi, Gustavo Héctor
Fernández, Alejandro
author Rodríguez, Martín
author_facet Rodríguez, Martín
Rossi, Gustavo Héctor
Fernández, Alejandro
author_role author
author2 Rossi, Gustavo Héctor
Fernández, Alejandro
author2_role author
author
dc.subject.none.fl_str_mv Ciencias de la Computación e Información
Evaluation
Unit Testing
LLM
topic Ciencias de la Computación e Información
Evaluation
Unit Testing
LLM
dc.description.none.fl_txt_mv The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
description The design and implementation of unit tests is a complex task many programmers neglect. This research evaluates the potential of Large Language Models (LLMs) in automatically generating test cases, comparing them with manual tests. An optimized prompt was developed, that integrates code and requirements, covering critical cases such as equivalence partitions and boundary values. The strengths and weaknesses of LLMs versus trained programmers were compared through quantitative metrics and manual qualitative analysis. The results show that the effectiveness of LLMs depends on well-designed prompts, robust implementation, and precise requirements. Although flexible and promising, LLMs still require human supervision. This work highlights the importance of manual qualitative analysis as an essential complement to automation in unit test evaluation.
publishDate 2025
dc.date.none.fl_str_mv 2025
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv https://digital.cic.gba.gob.ar/handle/11746/12673
url https://digital.cic.gba.gob.ar/handle/11746/12673
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:CIC Digital (CICBA)
instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron:CICBA
reponame_str CIC Digital (CICBA)
collection CIC Digital (CICBA)
instname_str Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron_str CICBA
institution CICBA
repository.name.fl_str_mv CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
repository.mail.fl_str_mv marisa.degiusti@sedici.unlp.edu.ar
_version_ 1860736755337003008
score 13.332987