Representation Without Control: Testing the Realization Effect in Language Models 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ciar\'an Walsh, Emilio Barkett

摘要

arXiv:2605.25151v1 Announce Type: new Abstract: Large language models are increasingly used as behavioral simulators, but it remains unclear when their outputs reflect human-like cognitive mechanisms rather than prompt-sensitive surface patterns. We study this question through the realization effect, a well-characterized finding in behavioral economics in which risk-taking differs systematically after paper versus realized gains and losses. We evaluate LLM behavior at three levels: prompt-only behavioral sensitivity, linear readout of internal representations, and causal control via activation steering. Prompt-only results show systematic condition sensitivity, but the directional pattern does not reproduce human realization-effect predictions. Gemma's residual stream contains a linearly decodable realization-status signal at layer 18 that generalizes to held-out prompts.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据