Janus: A Benchmark for Goal-Conditioned Information Distortion in LLMs 文章

ArXiv CS.CL2026-06-10NEWSen作者: Polydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou

详细信息

来源站点
ArXiv CS.CL
作者
Polydoros Giannouris, Mohsinul Kabir, Sophia Ananiadou
文章类型
NEWS
语言
en
发布日期
2026-06-10

摘要

arXiv:2606.10852v1 Announce Type: new Abstract: LLM deception is often evaluated through direct markers such as fabricated claims, explicit lies, or strategic concealment. However, many real-world misleading communications do not depend on false statements, rather, they arise from selective treatment of true material facts: omitting adverse evidence, softening unfavorable details, emphasizing favorable details, or replacing precise qualifications with vague language. Existing benchmarks largely miss this subtler and arguably more dangerous failure mode. We introduce JANUS, a benchmark for measuring goal-conditioned pragmatic distortion in fact-grounded LLM outputs. Each scenario in our benchmark provides a fixed pool of favorable and adverse facts and compares a neutral condition against a goal-directed condition, such as increasing adoption, enrollment, approval, or support, despite potential harm to directly affected individuals or groups.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据