Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations
- Autores
- Gianolini, Agustín; Paez, Belén; Totaro, Facundo; Laurino, Julieta; Travi, Fermín; Fernández Slezak, Diego; Kaczer, Laura; Kamienkowski, Juan E.; Bianchi, Bruno
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
LLMs
disambiguation
neurolinguistics - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
.jpg)
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/190657
Ver los metadatos del registro completo
| id |
SEDICI_0cc57fb1dc6b34196a8f6ed88b22b306 |
|---|---|
| oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/190657 |
| network_acronym_str |
SEDICI |
| repository_id_str |
1329 |
| network_name_str |
SEDICI (UNLP) |
| spelling |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representationsGianolini, AgustínPaez, BelénTotaro, FacundoLaurino, JulietaTravi, FermínFernández Slezak, DiegoKaczer, LauraKamienkowski, Juan E.Bianchi, BrunoCiencias InformáticasLLMsdisambiguationneurolinguisticsLarge Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems.Sociedad Argentina de Informática e Investigación Operativa2025-08info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf265-282http://sedici.unlp.edu.ar/handle/10915/190657enginfo:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19824info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-02-26T11:39:42Zoai:sedici.unlp.edu.ar:10915/190657Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-02-26 11:39:42.702SEDICI (UNLP) - Universidad Nacional de La Platafalse |
| dc.title.none.fl_str_mv |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| title |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| spellingShingle |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations Gianolini, Agustín Ciencias Informáticas LLMs disambiguation neurolinguistics |
| title_short |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| title_full |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| title_fullStr |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| title_full_unstemmed |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| title_sort |
Decoding semantic ambiguity in Large Language Models: Aligning human behavioral responses with GPT-2’s internal representations |
| dc.creator.none.fl_str_mv |
Gianolini, Agustín Paez, Belén Totaro, Facundo Laurino, Julieta Travi, Fermín Fernández Slezak, Diego Kaczer, Laura Kamienkowski, Juan E. Bianchi, Bruno |
| author |
Gianolini, Agustín |
| author_facet |
Gianolini, Agustín Paez, Belén Totaro, Facundo Laurino, Julieta Travi, Fermín Fernández Slezak, Diego Kaczer, Laura Kamienkowski, Juan E. Bianchi, Bruno |
| author_role |
author |
| author2 |
Paez, Belén Totaro, Facundo Laurino, Julieta Travi, Fermín Fernández Slezak, Diego Kaczer, Laura Kamienkowski, Juan E. Bianchi, Bruno |
| author2_role |
author author author author author author author author |
| dc.subject.none.fl_str_mv |
Ciencias Informáticas LLMs disambiguation neurolinguistics |
| topic |
Ciencias Informáticas LLMs disambiguation neurolinguistics |
| dc.description.none.fl_txt_mv |
Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems. Sociedad Argentina de Informática e Investigación Operativa |
| description |
Large Language Models (LLMs), such as GPT-2, exhibit human-like text processing, yet their internal mechanisms for resolving semantic ambiguity remain opaque, similar to the “black-box” of human cognition. This study investigates how LLMs disambiguate concrete nouns by comparing their semantic biases to human behavioral responses. A corpus of sentences containing ambiguous words (e.g., “note”) paired with biasing contexts (e.g., short paragraphs related to “music” and “education”) was created. Human participants identified their perceived meanings of ambiguous words in these contexts, establishing a behavioral ground truth (i.e. human bias). The computer bias was measured via cosine distances between meanings’ static embeddings and ambiguous words’ contextualized embeddings. To improve the computer bias metric, two technical steps were implemented: (1) the model was fine-tuned to obtain a word-based tokenization, and (2) each ambiguous word’s meaning was defined using word lists. Results revealed a nonlinear dynamic in the GPT-2 computer bias and an additive effect of both improvements analyzed in the present work. Additionally, we found that the correlation between human bias and computer bias, measured layer-by-layer, topped at the middle layers. This result is in line with previous findings in human-model alignment research. This suggests shared computational principles between human cognition and LLM processing for resolving ambiguity. The study advances interpretability research by linking model-internal representations to human behavioral benchmarks, offering insights into both artificial and biological language systems. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-08 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
| format |
conferenceObject |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/190657 |
| url |
http://sedici.unlp.edu.ar/handle/10915/190657 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://revistas.unlp.edu.ar/JAIIO/article/view/19824 info:eu-repo/semantics/altIdentifier/issn/2451-7496 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
| dc.format.none.fl_str_mv |
application/pdf 265-282 |
| dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
| reponame_str |
SEDICI (UNLP) |
| collection |
SEDICI (UNLP) |
| instname_str |
Universidad Nacional de La Plata |
| instacron_str |
UNLP |
| institution |
UNLP |
| repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
| repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
| _version_ |
1858282592016531456 |
| score |
12.665996 |