Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness 文章

ArXiv CS.CL2026-06-05NEWSen作者: Victor De Marez, Luna De Bruyne, Walter Daelemans

摘要

arXiv:2606.06306v1 Announce Type: new Abstract: Factual sycophancy occurs when a language model abandons a correct, verifiable answer under social pressure. Because a flip occurs only when pressure toward a false answer exceeds the model's neutral preference for the truth, flip rates conflate two mechanisms: the strength of that baseline preference (truth margin), and how far pressure shifts it (manipulation sensitivity). We decompose factual sycophancy into these channels and use them to separate the effects of size and instruction tuning across 56 open-weight models spanning 0.3B-32B parameters and 13 manipulation types. We find that vulnerability is governed mainly by size, but instruction tuning changes how size acts: small instruction-tuned models can become less robust, whereas large instruction-tuned models usually become more robust. Instruction tuning primarily increases truth margin, but its behavioral effect depends on manipulation type.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据