Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations

Autores: Gianolini, Agustín; Paez, Belén; Totaro, Facundo; Laurino, Julieta; Travi, Fermín; Fernández Slezak, Diego; Kaczer, Laura; Kamienkowski, Juan E.; Bianchi, Bruno
Año de publicación: 2025
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
LLMs
disambiguation
neurolinguistics
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/190657

Acceder

id	SEDICI_0cc57fb1dc6b34196a8f6ed88b22b306
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/190657
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representationsGianolini, AgustínPaez, BelénTotaro, FacundoLaurino, JulietaTravi, FermínFernández Slezak, DiegoKaczer, LauraKamienkowski, Juan E.Bianchi, BrunoCiencias InformáticasLLMsdisambiguationneurolinguisticsLarge Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.Sociedad Argentina de Informática e Investigación Operativa2025-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf265-282http://sedici.unlp.edu.ar/handle/10915/190657enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19824info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-13T12:58:43Zoai:sedici.unlp.edu.ar:10915/190657Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-13 12:58:43.771SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
spellingShingle	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations Gianolini, Agustín Ciencias Informáticas LLMs disambiguation neurolinguistics
title_short	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_full	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_fullStr	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_full_unstemmed	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_sort	Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
dc.creator.none.fl_str_mv	Gianolini, Agustín Paez, Belén Totaro, Facundo Laurino, Julieta Travi, Fermín Fernández Slezak, Diego Kaczer, Laura Kamienkowski, Juan E. Bianchi, Bruno
author	Gianolini, Agustín
author_facet	Gianolini, Agustín Paez, Belén Totaro, Facundo Laurino, Julieta Travi, Fermín Fernández Slezak, Diego Kaczer, Laura Kamienkowski, Juan E. Bianchi, Bruno
author_role	author
author2	Paez, Belén Totaro, Facundo Laurino, Julieta Travi, Fermín Fernández Slezak, Diego Kaczer, Laura Kamienkowski, Juan E. Bianchi, Bruno
author2_role	author author author author author author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas LLMs disambiguation neurolinguistics
topic	Ciencias Informáticas LLMs disambiguation neurolinguistics
dc.description.none.fl_txt_mv	Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems. Sociedad Argentina de Informática e Investigación Operativa
description	Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.
publishDate	2025
dc.date.none.fl_str_mv	2025-08
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/190657
url	http://sedici.unlp.edu.ar/handle/10915/190657
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19824 info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 265-282
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1865172474738507776
score	13.115601

Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations

Publicaciones similares