Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams

Autores: Rodríguez-Betancourt, Esteban; Casasola-Murillo, Edgar
Año de publicación: 2024
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/177179

Acceder

id	SEDICI_d947843395957cd5049b9a3a5e33f512
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/177179
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Teaching SQL New Tricks: Efficient Vector Indexing with TrigramsRodríguez-Betancourt, EstebanCasasola-Murillo, EdgarCiencias InformáticasDatabasesIndexesNatural Language ProcessingWord EmbeddingsWith the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.Sociedad Argentina de Informática e Investigación Operativa2024-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf150-157http://sedici.unlp.edu.ar/handle/10915/177179enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/17913info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-06T12:55:08Zoai:sedici.unlp.edu.ar:10915/177179Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-06 12:55:08.784SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
spellingShingle	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams Rodríguez-Betancourt, Esteban Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings
title_short	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_full	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_fullStr	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_full_unstemmed	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
title_sort	Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
dc.creator.none.fl_str_mv	Rodríguez-Betancourt, Esteban Casasola-Murillo, Edgar
author	Rodríguez-Betancourt, Esteban
author_facet	Rodríguez-Betancourt, Esteban Casasola-Murillo, Edgar
author_role	author
author2	Casasola-Murillo, Edgar
author2_role	author
dc.subject.none.fl_str_mv	Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings
topic	Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings
dc.description.none.fl_txt_mv	With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications. Sociedad Argentina de Informática e Investigación Operativa
description	With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
publishDate	2024
dc.date.none.fl_str_mv	2024-08
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/177179
url	http://sedici.unlp.edu.ar/handle/10915/177179
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/17913 info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 150-157
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1864469049895813120
score	13.1485815

Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams

Publicaciones similares