Failure of contextual invariance in large language models 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Failure of contextual invariance in large language models arXiv:2603.23485v2 Announce Type: replace Abstract: Standard evaluation practices assume that large language model (LLM) outputs are stable when prompts are embedded in contextually equivalent discourses. Here, we test this assumption in the setting of gender inference. Using a controlled pronoun selection task, we introduce minimal, theoretically uninformative discourse context and find that this induces large, systematic shifts in mode