Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs 文章

ArXiv CS.CL2026-06-10NEWSen作者: Polydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou

详细信息

来源站点: ArXiv CS.CL
作者: Polydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou
文章类型: NEWS
语言: en
发布日期: 2026-06-10

摘要

arXiv:2606.10852v1 Announce Type: new Abstract: LLM deception is often evaluated through direct markers such as fabricated claims, explicit lies, or strategic concealment. However, many real-world misleading communications do not depend on false statements, rather, they arise from selective treatment of true material facts: omitting adverse evidence, softening unfavorable details, emphasizing favorable details, or replacing precise qualifications with vague language. Existing benchmarks largely miss this subtler and arguably more dangerous failure mode. We introduce JANUS, a benchmark for measuring goal-conditioned pragmatic distortion in fact-grounded LLM outputs. Each scenario in our benchmark provides a fixed pool of favorable and adverse facts and compares a neutral condition against a goal-directed condition, such as increasing adoption, enrollment, approval, or support, despite potential harm to directly affected individuals or groups.

Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (1)