Truthful AI Advisors: A Pre-Specified Benchmark for Large Language Model Honesty Under Preference Misalignment 文章

ArXiv CS.CL2026-06-02NEWSen作者: Hamidreza Hasani Balyani, Seyed Pouyan Mousavi Davoudi, Alireza Amiri-Margavi, Amin Gholami Davodi, Arshia Gharagozlou

详细信息

来源站点
ArXiv CS.CL
作者
Hamidreza Hasani Balyani, Seyed Pouyan Mousavi Davoudi, Alireza Amiri-Margavi, Amin Gholami Davodi, Arshia Gharagozlou
文章类型
NEWS
语言
en
发布日期
2026-06-02

摘要

arXiv:2606.01456v1 Announce Type: cross Abstract: Large language models are increasingly deployed as advisors whose objective is not aligned with the user's: recommenders optimize for engagement, sales assistants for purchases, negotiation agents for concessions. Whether such advisors stay truthful when honesty conflicts with their own payoff is a core alignment-evaluation question. We turn the canonical Crawford-Sobel cheap-talk model into a pre-specified benchmark for LLM honesty under preference misalignment. Cheap-talk theory predicts neither full revelation nor silence but coarse monotone partitions, with fewer informative intervals as preference conflict grows. A sender observes a state omega in [0,1], wants the receiver's action near omega+b, and sends one costless message to a receiver whose ideal action is omega. The design uses 5 bias levels, 3 prompt frames, a fixed low-temperature setting, and 200 states per cell: 12,000 sender calls. For the positive-bias grid b in {0.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据