Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations

Autores
Gianolini, Agustín; Paez, Belén; Totaro, Facundo; Laurino, Julieta; Travi, Fermín; Fernández Slezak, Diego; Kaczer, Laura; Kamienkowski, Juan E.; Bianchi, Bruno
Año de publicación
2025
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
LLMs
disambiguation
neurolinguistics
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/190657

id SEDICI_0cc57fb1dc6b34196a8f6ed88b22b306
oai_identifier_str oai:sedici.unlp.edu.ar:10915/190657
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representationsGianolini, AgustínPaez, BelénTotaro, FacundoLaurino, JulietaTravi, FermínFernández Slezak, DiegoKaczer, LauraKamienkowski, Juan E.Bianchi, BrunoCiencias InformáticasLLMsdisambiguationneurolinguisticsLarge Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.Sociedad Argentina de Informática e Investigación Operativa2025-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf265-282http://sedici.unlp.edu.ar/handle/10915/190657enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19824info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-02-26T11:39:42Zoai:sedici.unlp.edu.ar:10915/190657Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-02-26 11:39:42.702SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
spellingShingle Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
Gianolini, Agustín
Ciencias Informáticas
LLMs
disambiguation
neurolinguistics
title_short Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_full Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_fullStr Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_full_unstemmed Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
title_sort Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
dc.creator.none.fl_str_mv Gianolini, Agustín
Paez, Belén
Totaro, Facundo
Laurino, Julieta
Travi, Fermín
Fernández Slezak, Diego
Kaczer, Laura
Kamienkowski, Juan E.
Bianchi, Bruno
author Gianolini, Agustín
author_facet Gianolini, Agustín
Paez, Belén
Totaro, Facundo
Laurino, Julieta
Travi, Fermín
Fernández Slezak, Diego
Kaczer, Laura
Kamienkowski, Juan E.
Bianchi, Bruno
author_role author
author2 Paez, Belén
Totaro, Facundo
Laurino, Julieta
Travi, Fermín
Fernández Slezak, Diego
Kaczer, Laura
Kamienkowski, Juan E.
Bianchi, Bruno
author2_role author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
LLMs
disambiguation
neurolinguistics
topic Ciencias Informáticas
LLMs
disambiguation
neurolinguistics
dc.description.none.fl_txt_mv Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.
Sociedad Argentina de Informática e Investigación Operativa
description Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.
publishDate 2025
dc.date.none.fl_str_mv 2025-08
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/190657
url http://sedici.unlp.edu.ar/handle/10915/190657
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19824
info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
265-282
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1858282592016531456
score 12.665996