Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations

Autores
Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo
Año de publicación
2024
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.
Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Materia
Biases
Unsupervised Methods
Machine Learning
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/266763

id CONICETDig_34fb3a43149e13514432b0da36a8b0c5
oai_identifier_str oai:ri.conicet.gov.ar:11336/266763
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled PopulationsMansilla, Lucas AndrésClaucich, EstanislaoEcheveste, Rodrigo SebastiánMilone, Diego HumbertoFerrante, EnzoBiasesUnsupervised MethodsMachine Learninghttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaMIT Press2024-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/266763Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-242835-8856CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://openreview.net/forum?id=TorS8rxr3Rinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:40:08Zoai:ri.conicet.gov.ar:11336/266763instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:40:09.127CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
spellingShingle Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
Mansilla, Lucas Andrés
Biases
Unsupervised Methods
Machine Learning
title_short Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_full Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_fullStr Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_full_unstemmed Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_sort Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
dc.creator.none.fl_str_mv Mansilla, Lucas Andrés
Claucich, Estanislao
Echeveste, Rodrigo Sebastián
Milone, Diego Humberto
Ferrante, Enzo
author Mansilla, Lucas Andrés
author_facet Mansilla, Lucas Andrés
Claucich, Estanislao
Echeveste, Rodrigo Sebastián
Milone, Diego Humberto
Ferrante, Enzo
author_role author
author2 Claucich, Estanislao
Echeveste, Rodrigo Sebastián
Milone, Diego Humberto
Ferrante, Enzo
author2_role author
author
author
author
dc.subject.none.fl_str_mv Biases
Unsupervised Methods
Machine Learning
topic Biases
Unsupervised Methods
Machine Learning
purl_subject.fl_str_mv https://purl.org/becyt/ford/2.2
https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.
Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
description An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.
publishDate 2024
dc.date.none.fl_str_mv 2024-02
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/266763
Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-24
2835-8856
CONICET Digital
CONICET
url http://hdl.handle.net/11336/266763
identifier_str_mv Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-24
2835-8856
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://openreview.net/forum?id=TorS8rxr3R
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv MIT Press
publisher.none.fl_str_mv MIT Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846083516376285184
score 13.22299