VLMs May Not Globally Enhance Human Alignment over LLMs During Natural Reading 文章

ArXiv CS.CL2026-05-28NEWSen作者: Jinzhou Wu, Zhengwu Ma, Jixing Li, Baoping Tang, Zitong Lu

摘要

arXiv:2605.28818v1 Announce Type: new Abstract: Large language models (LLMs) have become increasingly useful computational models of human language processing, but it remains unclear whether vision-language learning makes text representations more human-like during natural reading. Here, we address this question by comparing tightly matched LLM and vision-language model (VLM) pairs under a strictly text-only setting, allowing us to isolate the effect of multimodal training history from online visual input or cross-modal fusion. We evaluate model alignment with a human natural-reading dataset that includes whole-cortex fMRI responses and synchronized eye-tracking saccades. Our findings demonstrate that multimodal pretraining may not confer a uniform, global advantage in human alignment during natural reading, indicating that language-internal representations remain the key factor for modeling human text processing.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据