Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity 文章

ArXiv CS.CL2026-06-02NEWSen作者: Jiaming Qu, Lucheng fu, Yibo Hu

摘要

arXiv:2606.01637v1 Announce Type: new Abstract: Large language models are increasingly used in multi-agent systems, where they see and respond to other agents' answers. A key risk is conformity: a model may abandon its own answer simply because others agree on a different one. Prior studies show that LLMs often revise toward a majority answer, but it remains unclear whether these revisions help correct mistakes as often as they introduce new errors. In this paper, we conduct a controlled study in which an LLM first answers a question, then sees simulated peer responses before making a final decision. We manipulate two social cues: consensus structure and authority labels assigned to peers, and measure how they influence beneficial and harmful revisions. Across four open-weight LLMs and seven QA datasets, we find that peer agreement makes it much easier to mislead initially correct models than to correct initially wrong ones.

Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术