Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models 文章

ArXiv CS.AI2026-05-28NEWSen作者: Sitong Fang, Shiyi Hou, Kaile Wang, Boyuan Chen, Donghai Hong, Jiayi Zhou, Josef Dai, Yaodong Yang, Jiaming Ji

查看原文 →

关系图谱

摘要

arXiv:2512.00349v2 Announce Type: replace Abstract: Are frontier AI systems becoming more capable? Certainly. Yet such progress is not an unalloyed blessing but rather a Trojan horse: behind their performance leaps lie more insidious and destructive safety risks, namely deception. Unlike hallucination, which arises from insufficient capability and leads to mistakes, deception represents a deeper threat in which models deliberately mislead users through complex reasoning and insincere responses. As system capabilities advance, deceptive behaviours have spread from textual to multimodal settings, amplifying their potential harm. First and foremost, how can we monitor these covert multimodal deceptive behaviors? Nevertheless, current research remains almost entirely confined to text, leaving the deceptive risks of multimodal large language models unexplored.

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术