A novel distance that reduces information loss in continuous characters with few observations
- Autores
- Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego
- Año de publicación
- 2023
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language.
Fil: Lo Valvo, Gerardo Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina
Fil: Lehmann, Oscar Emilio Rodrigo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; Argentina
Fil: Balseiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina - Materia
-
DISTANCE COEFFICIENT
DISTANCE MATRIX
CONTINUOUS CHARACTERS
INTERVALS
OVERLAP - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/226219
Ver los metadatos del registro completo
id |
CONICETDig_1cdb650b43a593e4030ca630f443c37d |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/226219 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
A novel distance that reduces information loss in continuous characters with few observationsLo Valvo, Gerardo AndresLehmann, Oscar Emilio RodrigoBalseiro, DiegoDISTANCE COEFFICIENTDISTANCE MATRIXCONTINUOUS CHARACTERSINTERVALSOVERLAPhttps://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.5https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language.Fil: Lo Valvo, Gerardo Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; ArgentinaFil: Lehmann, Oscar Emilio Rodrigo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; ArgentinaFil: Balseiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; ArgentinaCoquina Press2023-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/226219Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego; A novel distance that reduces information loss in continuous characters with few observations; Coquina Press; Palaeontologia Electronica; 26; 2; 7-2023; 1-91094-80741532-3056CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://palaeo-electronica.org/content/current-in-press-articles/3889-novel-distance-for-intervalsinfo:eu-repo/semantics/altIdentifier/doi/10.26879/1250info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:20:39Zoai:ri.conicet.gov.ar:11336/226219instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:20:39.971CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
A novel distance that reduces information loss in continuous characters with few observations |
title |
A novel distance that reduces information loss in continuous characters with few observations |
spellingShingle |
A novel distance that reduces information loss in continuous characters with few observations Lo Valvo, Gerardo Andres DISTANCE COEFFICIENT DISTANCE MATRIX CONTINUOUS CHARACTERS INTERVALS OVERLAP |
title_short |
A novel distance that reduces information loss in continuous characters with few observations |
title_full |
A novel distance that reduces information loss in continuous characters with few observations |
title_fullStr |
A novel distance that reduces information loss in continuous characters with few observations |
title_full_unstemmed |
A novel distance that reduces information loss in continuous characters with few observations |
title_sort |
A novel distance that reduces information loss in continuous characters with few observations |
dc.creator.none.fl_str_mv |
Lo Valvo, Gerardo Andres Lehmann, Oscar Emilio Rodrigo Balseiro, Diego |
author |
Lo Valvo, Gerardo Andres |
author_facet |
Lo Valvo, Gerardo Andres Lehmann, Oscar Emilio Rodrigo Balseiro, Diego |
author_role |
author |
author2 |
Lehmann, Oscar Emilio Rodrigo Balseiro, Diego |
author2_role |
author author |
dc.subject.none.fl_str_mv |
DISTANCE COEFFICIENT DISTANCE MATRIX CONTINUOUS CHARACTERS INTERVALS OVERLAP |
topic |
DISTANCE COEFFICIENT DISTANCE MATRIX CONTINUOUS CHARACTERS INTERVALS OVERLAP |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.1 https://purl.org/becyt/ford/1 https://purl.org/becyt/ford/1.5 https://purl.org/becyt/ford/1 https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language. Fil: Lo Valvo, Gerardo Andres. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina Fil: Lehmann, Oscar Emilio Rodrigo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales "Bernardino Rivadavia"; Argentina Fil: Balseiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones en Ciencias de la Tierra. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Centro de Investigaciones en Ciencias de la Tierra; Argentina |
description |
The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R programming language. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-07 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/226219 Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego; A novel distance that reduces information loss in continuous characters with few observations; Coquina Press; Palaeontologia Electronica; 26; 2; 7-2023; 1-9 1094-8074 1532-3056 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/226219 |
identifier_str_mv |
Lo Valvo, Gerardo Andres; Lehmann, Oscar Emilio Rodrigo; Balseiro, Diego; A novel distance that reduces information loss in continuous characters with few observations; Coquina Press; Palaeontologia Electronica; 26; 2; 7-2023; 1-9 1094-8074 1532-3056 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://palaeo-electronica.org/content/current-in-press-articles/3889-novel-distance-for-intervals info:eu-repo/semantics/altIdentifier/doi/10.26879/1250 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Coquina Press |
publisher.none.fl_str_mv |
Coquina Press |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842981130049945600 |
score |
12.48226 |