Similarity Searching using Hybrid Technique Permutation Graph and Clustering

Autores
Rocha, Gerardo; Figueroa, Karina; Reyes, Nora Susana
Año de publicación
2025
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Similarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.
Red de Universidades con Carreras en Informática
Materia
Ciencias Informáticas
Similarity Search
Metric Space
Distance Calculations
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/191306

id SEDICI_37f6c77452bc4c510da5f00b7b8f43af
oai_identifier_str oai:sedici.unlp.edu.ar:10915/191306
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Similarity Searching using Hybrid Technique Permutation Graph and ClusteringRocha, GerardoFigueroa, KarinaReyes, Nora SusanaCiencias InformáticasSimilarity SearchMetric SpaceDistance CalculationsSimilarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.Red de Universidades con Carreras en Informática2025-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf527-537http://sedici.unlp.edu.ar/handle/10915/191306enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-8258-99-7info:eu-repo/semantics/reference/hdl/10915/189846info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-03-26T09:21:32Zoai:sedici.unlp.edu.ar:10915/191306Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-03-26 09:21:33.191SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title Similarity Searching using Hybrid Technique Permutation Graph and Clustering
spellingShingle Similarity Searching using Hybrid Technique Permutation Graph and Clustering
Rocha, Gerardo
Ciencias Informáticas
Similarity Search
Metric Space
Distance Calculations
title_short Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_full Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_fullStr Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_full_unstemmed Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_sort Similarity Searching using Hybrid Technique Permutation Graph and Clustering
dc.creator.none.fl_str_mv Rocha, Gerardo
Figueroa, Karina
Reyes, Nora Susana
author Rocha, Gerardo
author_facet Rocha, Gerardo
Figueroa, Karina
Reyes, Nora Susana
author_role author
author2 Figueroa, Karina
Reyes, Nora Susana
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Similarity Search
Metric Space
Distance Calculations
topic Ciencias Informáticas
Similarity Search
Metric Space
Distance Calculations
dc.description.none.fl_txt_mv Similarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.
Red de Universidades con Carreras en Informática
description Similarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.
publishDate 2025
dc.date.none.fl_str_mv 2025-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/191306
url http://sedici.unlp.edu.ar/handle/10915/191306
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-987-8258-99-7
info:eu-repo/semantics/reference/hdl/10915/189846
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
527-537
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1860736630977986560
score 12.977003