Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
- Autores
- Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo
- Año de publicación
- 2024
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.
Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina - Materia
-
Biases
Unsupervised Methods
Machine Learning - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/266763
Ver los metadatos del registro completo
id |
CONICETDig_34fb3a43149e13514432b0da36a8b0c5 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/266763 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled PopulationsMansilla, Lucas AndrésClaucich, EstanislaoEcheveste, Rodrigo SebastiánMilone, Diego HumbertoFerrante, EnzoBiasesUnsupervised MethodsMachine Learninghttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaMIT Press2024-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/266763Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-242835-8856CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://openreview.net/forum?id=TorS8rxr3Rinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:40:08Zoai:ri.conicet.gov.ar:11336/266763instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:40:09.127CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
title |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
spellingShingle |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations Mansilla, Lucas Andrés Biases Unsupervised Methods Machine Learning |
title_short |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
title_full |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
title_fullStr |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
title_full_unstemmed |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
title_sort |
Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations |
dc.creator.none.fl_str_mv |
Mansilla, Lucas Andrés Claucich, Estanislao Echeveste, Rodrigo Sebastián Milone, Diego Humberto Ferrante, Enzo |
author |
Mansilla, Lucas Andrés |
author_facet |
Mansilla, Lucas Andrés Claucich, Estanislao Echeveste, Rodrigo Sebastián Milone, Diego Humberto Ferrante, Enzo |
author_role |
author |
author2 |
Claucich, Estanislao Echeveste, Rodrigo Sebastián Milone, Diego Humberto Ferrante, Enzo |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
Biases Unsupervised Methods Machine Learning |
topic |
Biases Unsupervised Methods Machine Learning |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/2.2 https://purl.org/becyt/ford/2 |
dc.description.none.fl_txt_mv |
An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts. Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina |
description |
An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-02 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/266763 Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-24 2835-8856 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/266763 |
identifier_str_mv |
Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-24 2835-8856 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://openreview.net/forum?id=TorS8rxr3R |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
MIT Press |
publisher.none.fl_str_mv |
MIT Press |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1846083516376285184 |
score |
13.22299 |