CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model 文章

ArXiv CS.AI2026-06-06NEWSen作者: Zeyang Yue, Chenfei Yan, Feifei Zhao, Haibo Tong, Mengwen Xu, Xiaozhen Wang, Erliang Lin, Yi Zeng

查看原文 →

关系图谱

摘要

arXiv:2606.06099v1 Announce Type: new Abstract: Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit rule compliance and static prompts, failing to capture the dynamic and covert nature of manipulative strategies in multi-turn dialogues. We introduce CogManip, a comprehensive benchmark that evaluates 15 manipulation strategy risks across 1,000 multi-turn interaction scenarios, validated by human experts. A systematic evaluation of 13 representative models, including frontier models like GPT-5.4 and DeepSeek-V3.2, reveals significant risk heterogeneities and illuminates the targeted direction for future defense. Further analysis of objective function perturbation reveals that DeepSeek-V3.

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (6)

相关技术