Similarity Searching using Hybrid Technique Permutation Graph and Clustering

Autores: Rocha, Gerardo; Figueroa, Karina; Reyes, Nora Susana
Año de publicación: 2025
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Similarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.
Red de Universidades con Carreras en Informática
Materia: Ciencias Informáticas
Similarity Search
Metric Space
Distance Calculations
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/191306

Acceder

id	SEDICI_37f6c77452bc4c510da5f00b7b8f43af
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/191306
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Similarity Searching using Hybrid Technique Permutation Graph and ClusteringRocha, GerardoFigueroa, KarinaReyes, Nora SusanaCiencias InformáticasSimilarity SearchMetric SpaceDistance CalculationsSimilarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.Red de Universidades con Carreras en Informática2025-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf527-537http://sedici.unlp.edu.ar/handle/10915/191306enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-8258-99-7info:eu-repo/semantics/reference/hdl/10915/189846info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-06T13:00:05Zoai:sedici.unlp.edu.ar:10915/191306Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-06 13:00:06.005SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
spellingShingle	Similarity Searching using Hybrid Technique Permutation Graph and Clustering Rocha, Gerardo Ciencias Informáticas Similarity Search Metric Space Distance Calculations
title_short	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_full	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_fullStr	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_full_unstemmed	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
title_sort	Similarity Searching using Hybrid Technique Permutation Graph and Clustering
dc.creator.none.fl_str_mv	Rocha, Gerardo Figueroa, Karina Reyes, Nora Susana
author	Rocha, Gerardo
author_facet	Rocha, Gerardo Figueroa, Karina Reyes, Nora Susana
author_role	author
author2	Figueroa, Karina Reyes, Nora Susana
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Similarity Search Metric Space Distance Calculations
topic	Ciencias Informáticas Similarity Search Metric Space Distance Calculations
dc.description.none.fl_txt_mv	Similarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm. Red de Universidades con Carreras en Informática
description	Similarity searches aim to find the elements most similar to a query within a database. One way to model this is by only considering the distance between the searched element and the rest of the database, which allows it to be defined as a metric space. The advantage is that large amounts of training data are not required, which is the main challenge today. The obvious strategy is to compare the entire database; however, with computationally expensive distances, this can consume time and resources. This work proposes using a data structure to perform searches on it, to reduce the number of distance calculations used. In particular, we propose combining strategies that are proven to be efficient: algorithms based on permutations and those based on clustering and graphs. Experiments show that we can achieve reductions of up to 20% on the number of distance calculations needed for the permutation based algorithm.
publishDate	2025
dc.date.none.fl_str_mv	2025-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/191306
url	http://sedici.unlp.edu.ar/handle/10915/191306
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/978-987-8258-99-7 info:eu-repo/semantics/reference/hdl/10915/189846
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 527-537
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1864469134598733824
score	13.1485815

Similarity Searching using Hybrid Technique Permutation Graph and Clustering

Publicaciones similares