Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models arXiv:2512.00349v2 Announce Type: replace Abstract: Are frontier AI systems becoming more capable? Certainly. Yet such progress is not an unalloyed blessing but rather a Trojan horse: behind their performance leaps lie more insidious and destructive safety risks, namely deception. Unlike hallucination, which arises from insufficient capability and leads to mistakes, deception represents a deeper threat in whic