A novel distance that reduces information loss in continuous characters with few observations

Autores
Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego
Año de publicación
2023
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language.
Fil: Lo Valvo, Gerardo Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina
Fil: Lehmann, Oscar Emilio Rodrigo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; Argentina
Fil: Balseiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina
Materia
DISTANCE COEFFICIENT
DISTANCE MATRIX
CONTINUOUS CHARACTERS
INTERVALS
OVERLAP
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/226219

id CONICETDig_1cdb650b43a593e4030ca630f443c37d
oai_identifier_str oai:ri.conicet.gov.ar:11336/226219
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A novel distance that reduces information loss in continuous characters with few observationsLo Valvo, Gerardo AndresLehmann, Oscar Emilio RodrigoBalseiro, DiegoDISTANCE COEFFICIENTDISTANCE MATRIXCONTINUOUS CHARACTERSINTERVALSOVERLAPhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.5https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language.Fil: Lo Valvo, Gerardo Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; ArgentinaFil: Lehmann, Oscar Emilio Rodrigo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; ArgentinaFil: Balseiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; ArgentinaCoquina Press2023-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/226219Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego; A novel distance that reduces information loss in continuous characters with few observations; Coquina Press; Palaeontologia Electronica; 26; 2; 7-2023; 1-91094-80741532-3056CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://palaeo-electronica.org/content/current-in-press-articles/3889-novel-distance-for-intervalsinfo:eu-repo/semantics/altIdentifier/doi/10.26879/1250info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:20:39Zoai:ri.conicet.gov.ar:11336/226219instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:20:39.971CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A novel distance that reduces information loss in continuous characters with few observations
title A novel distance that reduces information loss in continuous characters with few observations
spellingShingle A novel distance that reduces information loss in continuous characters with few observations
Lo Valvo, Gerardo Andres
DISTANCE COEFFICIENT
DISTANCE MATRIX
CONTINUOUS CHARACTERS
INTERVALS
OVERLAP
title_short A novel distance that reduces information loss in continuous characters with few observations
title_full A novel distance that reduces information loss in continuous characters with few observations
title_fullStr A novel distance that reduces information loss in continuous characters with few observations
title_full_unstemmed A novel distance that reduces information loss in continuous characters with few observations
title_sort A novel distance that reduces information loss in continuous characters with few observations
dc.creator.none.fl_str_mv Lo Valvo, Gerardo Andres
Lehmann, Oscar Emilio Rodrigo
Balseiro, Diego
author Lo Valvo, Gerardo Andres
author_facet Lo Valvo, Gerardo Andres
Lehmann, Oscar Emilio Rodrigo
Balseiro, Diego
author_role author
author2 Lehmann, Oscar Emilio Rodrigo
Balseiro, Diego
author2_role author
author
dc.subject.none.fl_str_mv DISTANCE COEFFICIENT
DISTANCE MATRIX
CONTINUOUS CHARACTERS
INTERVALS
OVERLAP
topic DISTANCE COEFFICIENT
DISTANCE MATRIX
CONTINUOUS CHARACTERS
INTERVALS
OVERLAP
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.1
https://purl.org/becyt/ford/1
https://purl.org/becyt/ford/1.5
https://purl.org/becyt/ford/1
https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language.
Fil: Lo Valvo, Gerardo Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina
Fil: Lehmann, Oscar Emilio Rodrigo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; Argentina
Fil: Balseiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina
description The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language.
publishDate 2023
dc.date.none.fl_str_mv 2023-07
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/226219
Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego; A novel distance that reduces information loss in continuous characters with few observations; Coquina Press; Palaeontologia Electronica; 26; 2; 7-2023; 1-9
1094-8074
1532-3056
CONICET Digital
CONICET
url http://hdl.handle.net/11336/226219
identifier_str_mv Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego; A novel distance that reduces information loss in continuous characters with few observations; Coquina Press; Palaeontologia Electronica; 26; 2; 7-2023; 1-9
1094-8074
1532-3056
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://palaeo-electronica.org/content/current-in-press-articles/3889-novel-distance-for-intervals
info:eu-repo/semantics/altIdentifier/doi/10.26879/1250
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Coquina Press
publisher.none.fl_str_mv Coquina Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842981130049945600
score 12.48226