Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams

Autores
Rodríguez-Betancourt, Esteban; Casasola-Murillo, Edgar
Año de publicación
2024
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/177179

id SEDICI_d947843395957cd5049b9a3a5e33f512
oai_identifier_str oai:sedici.unlp.edu.ar:10915/177179
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Teaching SQL New Tricks: Efficient Vector Indexing with TrigramsRodríguez-Betancourt, EstebanCasasola-Murillo, EdgarCiencias InformáticasDatabasesIndexesNatural Language ProcessingWord EmbeddingsWith the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.Sociedad Argentina de Informática e Investigación Operativa2024-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf150-157http://sedici.unlp.edu.ar/handle/10915/177179enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/17913info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:47:49Zoai:sedici.unlp.edu.ar:10915/177179Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:47:49.528SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
spellingShingle Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
Rodríguez-Betancourt, Esteban
Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
title_short Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_full Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_fullStr Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_full_unstemmed Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_sort Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
dc.creator.none.fl_str_mv Rodríguez-Betancourt, Esteban
Casasola-Murillo, Edgar
author Rodríguez-Betancourt, Esteban
author_facet Rodríguez-Betancourt, Esteban
Casasola-Murillo, Edgar
author_role author
author2 Casasola-Murillo, Edgar
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
topic Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
dc.description.none.fl_txt_mv With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
Sociedad Argentina de Informática e Investigación Operativa
description With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
publishDate 2024
dc.date.none.fl_str_mv 2024-08
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/177179
url http://sedici.unlp.edu.ar/handle/10915/177179
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/17913
info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
150-157
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616341498101760
score 13.070432