Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"

Autores: Farías, Andrés Francisco; Montejano, Germán Antonio; Garis, Ana Gabriela; Farías, Andrés Alejandro; Farías, Sebastián Javier
Año de publicación: 2025
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: This paper proposes the use of machine learning models in the "Aprender" standardized assessment tests implemented in Argentina. These tests measure language and mathematics performance in primary and secondary school. The proposed study used data from the 2018 edition of the sixth-grade primary education assessment. During the research phase, language and mathematics performance were analyzed, the results of which are presented in this article. To this end, a preliminary feature selection was performed, followed by a preselection of some of the models used in this experiment, belonging to the Python library Scikit-Learn (Sklearn). The following classifier methods were considered: Extra Tree Classifier, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, and Kneighbors Classifier. Of these, the model that achieved the highest level of accuracy was identified. In addition, the datasets used underwent preliminary processing, during which missing and negative data were filled using the median of each column. Finally, the most significant features that lead to the best results were identified.
Este trabajo propone el uso de modelos de aprendizaje automático en las pruebas de evaluación estandarizadas "Aprender" implementadas en Argentina. Estas pruebas miden el rendimiento en lenguaje y matemáticas en escuelas primarias y secundarias. El estudio propuesto utilizó datos de la edición 2018 de la evaluación de sexto grado de educación primaria. Durante la fase de investigación, se analizó el rendimiento en lenguaje y matemáticas, cuyos resultados se presentan en este artículo. Para ello, se realizó una selección preliminar de características, seguida de una preselección de algunos de los modelos utilizados en este experimento, pertenecientes a la biblioteca de Python Scikit-Learn (Sklearn). Se consideraron los siguientes métodos de clasificación: Clasificador de Árbol Extra, Clasificador de Árbol de Decisión, Clasificador de Bosque Aleatorio, Clasificador de Potenciación de Gradiente y Clasificador de Vecinos. De estos, se identificó el modelo que alcanzó el mayor nivel de precisión. Además, los conjuntos de datos utilizados se sometieron a un procesamiento preliminar, durante el cual los datos faltantes y negativos se completaron utilizando la mediana de cada columna. Finalmente, se identificaron las características más significativas que conducen a los mejores resultados.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
Modelos de aprendizaje automático
Función de selección
Modelos de selección
Educación
Machine learning models
Selection feature
Selection models
Education
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/190433

Acceder

id	SEDICI_b538c28e7fb445f42add844902d3e7ac
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/190433
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"Farías, Andrés FranciscoMontejano, Germán AntonioGaris, Ana GabrielaFarías, Andrés AlejandroFarías, Sebastián JavierCiencias InformáticasModelos de aprendizaje automáticoFunción de selecciónModelos de selecciónEducaciónMachine learning modelsSelection featureSelection modelsEducationThis paper proposes the use of machine learning models in the "Aprender" standardized assessment tests implemented in Argentina. These tests measure language and mathematics performance in primary and secondary school. The proposed study used data from the 2018 edition of the sixth-grade primary education assessment. During the research phase, language and mathematics performance were analyzed, the results of which are presented in this article. To this end, a preliminary feature selection was performed, followed by a preselection of some of the models used in this experiment, belonging to the Python library Scikit-Learn (Sklearn). The following classifier methods were considered: Extra Tree Classifier, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, and Kneighbors Classifier. Of these, the model that achieved the highest level of accuracy was identified. In addition, the datasets used underwent preliminary processing, during which missing and negative data were filled using the median of each column. Finally, the most significant features that lead to the best results were identified.Este trabajo propone el uso de modelos de aprendizaje automático en las pruebas de evaluación estandarizadas "Aprender" implementadas en Argentina. Estas pruebas miden el rendimiento en lenguaje y matemáticas en escuelas primarias y secundarias. El estudio propuesto utilizó datos de la edición 2018 de la evaluación de sexto grado de educación primaria. Durante la fase de investigación, se analizó el rendimiento en lenguaje y matemáticas, cuyos resultados se presentan en este artículo. Para ello, se realizó una selección preliminar de características, seguida de una preselección de algunos de los modelos utilizados en este experimento, pertenecientes a la biblioteca de Python Scikit-Learn (Sklearn). Se consideraron los siguientes métodos de clasificación: Clasificador de Árbol Extra, Clasificador de Árbol de Decisión, Clasificador de Bosque Aleatorio, Clasificador de Potenciación de Gradiente y Clasificador de Vecinos. De estos, se identificó el modelo que alcanzó el mayor nivel de precisión. Además, los conjuntos de datos utilizados se sometieron a un procesamiento preliminar, durante el cual los datos faltantes y negativos se completaron utilizando la mediana de cada columna. Finalmente, se identificaron las características más significativas que conducen a los mejores resultados.Sociedad Argentina de Informática e Investigación Operativa2025-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf199-208http://sedici.unlp.edu.ar/handle/10915/190433enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19900info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-04-15T11:58:28Zoai:sedici.unlp.edu.ar:10915/190433Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-04-15 11:58:29.48SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
title	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
spellingShingle	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018" Farías, Andrés Francisco Ciencias Informáticas Modelos de aprendizaje automático Función de selección Modelos de selección Educación Machine learning models Selection feature Selection models Education
title_short	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
title_full	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
title_fullStr	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
title_full_unstemmed	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
title_sort	Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"
dc.creator.none.fl_str_mv	Farías, Andrés Francisco Montejano, Germán Antonio Garis, Ana Gabriela Farías, Andrés Alejandro Farías, Sebastián Javier
author	Farías, Andrés Francisco
author_facet	Farías, Andrés Francisco Montejano, Germán Antonio Garis, Ana Gabriela Farías, Andrés Alejandro Farías, Sebastián Javier
author_role	author
author2	Montejano, Germán Antonio Garis, Ana Gabriela Farías, Andrés Alejandro Farías, Sebastián Javier
author2_role	author author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Modelos de aprendizaje automático Función de selección Modelos de selección Educación Machine learning models Selection feature Selection models Education
topic	Ciencias Informáticas Modelos de aprendizaje automático Función de selección Modelos de selección Educación Machine learning models Selection feature Selection models Education
dc.description.none.fl_txt_mv	This paper proposes the use of machine learning models in the "Aprender" standardized assessment tests implemented in Argentina. These tests measure language and mathematics performance in primary and secondary school. The proposed study used data from the 2018 edition of the sixth-grade primary education assessment. During the research phase, language and mathematics performance were analyzed, the results of which are presented in this article. To this end, a preliminary feature selection was performed, followed by a preselection of some of the models used in this experiment, belonging to the Python library Scikit-Learn (Sklearn). The following classifier methods were considered: Extra Tree Classifier, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, and Kneighbors Classifier. Of these, the model that achieved the highest level of accuracy was identified. In addition, the datasets used underwent preliminary processing, during which missing and negative data were filled using the median of each column. Finally, the most significant features that lead to the best results were identified. Este trabajo propone el uso de modelos de aprendizaje automático en las pruebas de evaluación estandarizadas "Aprender" implementadas en Argentina. Estas pruebas miden el rendimiento en lenguaje y matemáticas en escuelas primarias y secundarias. El estudio propuesto utilizó datos de la edición 2018 de la evaluación de sexto grado de educación primaria. Durante la fase de investigación, se analizó el rendimiento en lenguaje y matemáticas, cuyos resultados se presentan en este artículo. Para ello, se realizó una selección preliminar de características, seguida de una preselección de algunos de los modelos utilizados en este experimento, pertenecientes a la biblioteca de Python Scikit-Learn (Sklearn). Se consideraron los siguientes métodos de clasificación: Clasificador de Árbol Extra, Clasificador de Árbol de Decisión, Clasificador de Bosque Aleatorio, Clasificador de Potenciación de Gradiente y Clasificador de Vecinos. De estos, se identificó el modelo que alcanzó el mayor nivel de precisión. Además, los conjuntos de datos utilizados se sometieron a un procesamiento preliminar, durante el cual los datos faltantes y negativos se completaron utilizando la mediana de cada columna. Finalmente, se identificaron las características más significativas que conducen a los mejores resultados. Sociedad Argentina de Informática e Investigación Operativa
description	This paper proposes the use of machine learning models in the "Aprender" standardized assessment tests implemented in Argentina. These tests measure language and mathematics performance in primary and secondary school. The proposed study used data from the 2018 edition of the sixth-grade primary education assessment. During the research phase, language and mathematics performance were analyzed, the results of which are presented in this article. To this end, a preliminary feature selection was performed, followed by a preselection of some of the models used in this experiment, belonging to the Python library Scikit-Learn (Sklearn). The following classifier methods were considered: Extra Tree Classifier, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, and Kneighbors Classifier. Of these, the model that achieved the highest level of accuracy was identified. In addition, the datasets used underwent preliminary processing, during which missing and negative data were filled using the median of each column. Finally, the most significant features that lead to the best results were identified.
publishDate	2025
dc.date.none.fl_str_mv	2025-08
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/190433
url	http://sedici.unlp.edu.ar/handle/10915/190433
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19900 info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 199-208
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1862569408406224896
score	13.203462

Machine Learning Models Applied to the Analysis of Results from the Educational Quality Assessment Operation: "Aprender-2018"

Publicaciones similares