Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations

Autores: Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo
Año de publicación: 2024
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.
Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Materia: Biases
Unsupervised Methods
Machine Learning
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/266763

Acceder

id	CONICETDig_34fb3a43149e13514432b0da36a8b0c5
oai_identifier_str	oai:ri.conicet.gov.ar:11336/266763
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled PopulationsMansilla, Lucas AndrésClaucich, EstanislaoEcheveste, Rodrigo SebastiánMilone, Diego HumbertoFerrante, EnzoBiasesUnsupervised MethodsMachine Learninghttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaMIT Press2024-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/266763Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-242835-8856CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://openreview.net/forum?id=TorS8rxr3Rinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-11-26T09:10:43Zoai:ri.conicet.gov.ar:11336/266763instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-11-26 09:10:44.23CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
spellingShingle	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations Mansilla, Lucas Andrés Biases Unsupervised Methods Machine Learning
title_short	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_full	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_fullStr	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_full_unstemmed	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
title_sort	Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations
dc.creator.none.fl_str_mv	Mansilla, Lucas Andrés Claucich, Estanislao Echeveste, Rodrigo Sebastián Milone, Diego Humberto Ferrante, Enzo
author	Mansilla, Lucas Andrés
author_facet	Mansilla, Lucas Andrés Claucich, Estanislao Echeveste, Rodrigo Sebastián Milone, Diego Humberto Ferrante, Enzo
author_role	author
author2	Claucich, Estanislao Echeveste, Rodrigo Sebastián Milone, Diego Humberto Ferrante, Enzo
author2_role	author author author author
dc.subject.none.fl_str_mv	Biases Unsupervised Methods Machine Learning
topic	Biases Unsupervised Methods Machine Learning
purl_subject.fl_str_mv	https://purl.org/becyt/ford/2.2 https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv	An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts. Fil: Mansilla, Lucas Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Claucich, Estanislao. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Echeveste, Rodrigo Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Ferrante, Enzo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
description	An ever-growing body of work has shown that machine learning systems can be systematically biased against certain sub-populations defined by attributes like race or gender.Data imbalance and under-representation of certain populations in the training datasetshave been identified as potential causes behind this phenomenon. However, understandingwhether data imbalance with respect to a specific demographic group may result in biasesfor a given task and model class is not simple. An approach to answering this question isto perform controlled experiments, where several models are trained with different imbalance ratios and then their performance is evaluated on the target population. However,in the absence of ground-truth annotations at deployment for an unseen population, mostfairness metrics cannot be computed. In this work, we explore an alternative method tostudy potential bias issues based on the output discrepancy of pools of models trained ondifferent demographic groups. Models within a pool are otherwise identical in terms ofarchitecture, hyper-parameters, and training scheme. Our hypothesis is that the outputconsistency between models may serve as a proxy to anticipate biases concerning demographic groups. In other words, if models tailored to different demographic groups produceinconsistent predictions, then biases are more prone to appear in the task under analysis. We formulate the Demographically-Informed Prediction Discrepancy Index (DIPDI)and validate our hypothesis in numerical experiments using both synthetic and real-worlddatasets. Our work sheds light on the relationship between model output discrepancy anddemographic biases and provides a means to anticipate potential bias issues in the absenceof ground-truth annotations. Indeed, we show how DIPDI could provide early warningsabout potential demographic biases when deploying machine learning models on new andunlabeled populations that exhibit demographic shifts.
publishDate	2024
dc.date.none.fl_str_mv	2024-02
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/266763 Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-24 2835-8856 CONICET Digital CONICET
url	http://hdl.handle.net/11336/266763
identifier_str_mv	Mansilla, Lucas Andrés; Claucich, Estanislao; Echeveste, Rodrigo Sebastián; Milone, Diego Humberto; Ferrante, Enzo; Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations; MIT Press; Transactions on Machine Learning Research; 2-2024; 1-24 2835-8856 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://openreview.net/forum?id=TorS8rxr3R
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf application/pdf application/pdf application/pdf
dc.publisher.none.fl_str_mv	MIT Press
publisher.none.fl_str_mv	MIT Press
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1849873712495460352
score	13.011256

Demographically-Informed Prediction Discrepancy Index: Early Warnings of Demographic Biases for Unlabeled Populations

Publicaciones similares