The Self-Correction Illusion: LLMs Correct Others but Not Themselves 文章

ArXiv CS.CL2026-06-05NEWSen作者: Kuan-Yen Chen, Fang-Yi Su, Jung-Hsien Chiang

摘要

arXiv:2606.05976v1 Announce Type: cross Abstract: Recent work shows that LLM agents struggle to correct errors in their own reasoning traces yet show markedly higher correction rates when identical claims appear under external sources. We ask whether this asymmetry reflects a capability deficit or a role-label artifact: does an agent's willingness to correct a wrong claim depend causally on the chat-template role that carries it, rather than on the claim's content? Our setup keeps the erroneous claim byte-identical across all conditions (SHA-256 verified) and varies only its wrapping role: the agent's own \role{}, a \role{user} message, a \role{tool} response, or a \role{system } block. Across 13 model-domain cells covering seven model families and three domains ($n{=}30$ paired tasks per cell), relabeling the claim from \role{} to an external role lifts the explicit-correction rate by 23 to 93 percentage points, with 10 of 13 cells reaching $p{} dominates on math, while a plain…

摘要可能不完整,可查看原文

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据