Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams
- Autores
- Rodríguez-Betancourt, Esteban; Casasola-Murillo, Edgar
- Año de publicación
- 2024
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
Databases
Indexes
Natural Language Processing
Word Embeddings - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/177179
Ver los metadatos del registro completo
id |
SEDICI_d947843395957cd5049b9a3a5e33f512 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/177179 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Teaching SQL New Tricks: Efficient Vector Indexing with TrigramsRodríguez-Betancourt, EstebanCasasola-Murillo, EdgarCiencias InformáticasDatabasesIndexesNatural Language ProcessingWord EmbeddingsWith the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.Sociedad Argentina de Informática e Investigación Operativa2024-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf150-157http://sedici.unlp.edu.ar/handle/10915/177179enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/17913info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:47:49Zoai:sedici.unlp.edu.ar:10915/177179Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:47:49.528SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
title |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
spellingShingle |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams Rodríguez-Betancourt, Esteban Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings |
title_short |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
title_full |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
title_fullStr |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
title_full_unstemmed |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
title_sort |
Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams |
dc.creator.none.fl_str_mv |
Rodríguez-Betancourt, Esteban Casasola-Murillo, Edgar |
author |
Rodríguez-Betancourt, Esteban |
author_facet |
Rodríguez-Betancourt, Esteban Casasola-Murillo, Edgar |
author_role |
author |
author2 |
Casasola-Murillo, Edgar |
author2_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings |
topic |
Ciencias Informáticas Databases Indexes Natural Language Processing Word Embeddings |
dc.description.none.fl_txt_mv |
With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications. Sociedad Argentina de Informática e Investigación Operativa |
description |
With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-08 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/177179 |
url |
http://sedici.unlp.edu.ar/handle/10915/177179 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/17913 info:eu-repo/semantics/altIdentifier/issn/2451-7496 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 150-157 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616341498101760 |
score |
13.070432 |