Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis

Autores
Gómez, Sergio Alejandro; Fillottrani, Pablo Rubén
Año de publicación
2024
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
In Ontology-Based Data Access (OBDA), we study how to represent legacy data sources using ontologies. This enables a modern, distributed, uniform data representation format with the ability to perform intelligent querying and processing. This task requires the development of software to interpret the data and express it as ontologies, which takes considerable time. On the other hand, large language models (LLM) have lately shown themselves to be great solution providers due to their ability to generate solutions from input specified in natural language by an end user. In this paper, we explore the potential of LLM to perform OBDA automatically. Our research hypothesis is that is possible to use an LLM tool like ChatGPT to perform OBDA. For this purpose, we studied ChatGPT responses with different problems associated with OBDA. We discovered that ChatGPT is able to generate ontologies from free text as well as from tables expressed as text or in CSV format. ChatGPT is also able to generate SPARQL queries, and it is also successful in expressing relational tables as ontologies being capable of correcting violations of integrity constraints when appropriately directed.
Red de Universidades con Carreras en Informática
Materia
Ciencias Informáticas
Ontology-Based Data Access
Large Language Models
Ontologies
CSV
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/176820

id SEDICI_dfcbebceee531776b5b4d6fb00440d28
oai_identifier_str oai:sedici.unlp.edu.ar:10915/176820
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary AnalysisGómez, Sergio AlejandroFillottrani, Pablo RubénCiencias InformáticasOntology-Based Data AccessLarge Language ModelsOntologiesCSVIn Ontology-Based Data Access (OBDA), we study how to represent legacy data sources using ontologies. This enables a modern, distributed, uniform data representation format with the ability to perform intelligent querying and processing. This task requires the development of software to interpret the data and express it as ontologies, which takes considerable time. On the other hand, large language models (LLM) have lately shown themselves to be great solution providers due to their ability to generate solutions from input specified in natural language by an end user. In this paper, we explore the potential of LLM to perform OBDA automatically. Our research hypothesis is that is possible to use an LLM tool like ChatGPT to perform OBDA. For this purpose, we studied ChatGPT responses with different problems associated with OBDA. We discovered that ChatGPT is able to generate ontologies from free text as well as from tables expressed as text or in CSV format. ChatGPT is also able to generate SPARQL queries, and it is also successful in expressing relational tables as ontologies being capable of correcting violations of integrity constraints when appropriately directed.Red de Universidades con Carreras en Informática2024-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf996-1005http://sedici.unlp.edu.ar/handle/10915/176820enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2428-5info:eu-repo/semantics/reference/hdl/10915/172755info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T11:39:14Zoai:sedici.unlp.edu.ar:10915/176820Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 11:39:14.968SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
title Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
spellingShingle Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
Gómez, Sergio Alejandro
Ciencias Informáticas
Ontology-Based Data Access
Large Language Models
Ontologies
CSV
title_short Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
title_full Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
title_fullStr Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
title_full_unstemmed Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
title_sort Leveraging Large Language Models for Ontology-Based Data Access: A Preliminary Analysis
dc.creator.none.fl_str_mv Gómez, Sergio Alejandro
Fillottrani, Pablo Rubén
author Gómez, Sergio Alejandro
author_facet Gómez, Sergio Alejandro
Fillottrani, Pablo Rubén
author_role author
author2 Fillottrani, Pablo Rubén
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Ontology-Based Data Access
Large Language Models
Ontologies
CSV
topic Ciencias Informáticas
Ontology-Based Data Access
Large Language Models
Ontologies
CSV
dc.description.none.fl_txt_mv In Ontology-Based Data Access (OBDA), we study how to represent legacy data sources using ontologies. This enables a modern, distributed, uniform data representation format with the ability to perform intelligent querying and processing. This task requires the development of software to interpret the data and express it as ontologies, which takes considerable time. On the other hand, large language models (LLM) have lately shown themselves to be great solution providers due to their ability to generate solutions from input specified in natural language by an end user. In this paper, we explore the potential of LLM to perform OBDA automatically. Our research hypothesis is that is possible to use an LLM tool like ChatGPT to perform OBDA. For this purpose, we studied ChatGPT responses with different problems associated with OBDA. We discovered that ChatGPT is able to generate ontologies from free text as well as from tables expressed as text or in CSV format. ChatGPT is also able to generate SPARQL queries, and it is also successful in expressing relational tables as ontologies being capable of correcting violations of integrity constraints when appropriately directed.
Red de Universidades con Carreras en Informática
description In Ontology-Based Data Access (OBDA), we study how to represent legacy data sources using ontologies. This enables a modern, distributed, uniform data representation format with the ability to perform intelligent querying and processing. This task requires the development of software to interpret the data and express it as ontologies, which takes considerable time. On the other hand, large language models (LLM) have lately shown themselves to be great solution providers due to their ability to generate solutions from input specified in natural language by an end user. In this paper, we explore the potential of LLM to perform OBDA automatically. Our research hypothesis is that is possible to use an LLM tool like ChatGPT to perform OBDA. For this purpose, we studied ChatGPT responses with different problems associated with OBDA. We discovered that ChatGPT is able to generate ontologies from free text as well as from tables expressed as text or in CSV format. ChatGPT is also able to generate SPARQL queries, and it is also successful in expressing relational tables as ontologies being capable of correcting violations of integrity constraints when appropriately directed.
publishDate 2024
dc.date.none.fl_str_mv 2024-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/176820
url http://sedici.unlp.edu.ar/handle/10915/176820
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2428-5
info:eu-repo/semantics/reference/hdl/10915/172755
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
996-1005
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1846064408449515520
score 13.22299