摘要
arXiv:2601.03134v2 Announce Type: replace Abstract: As LLMs gain persuasive capabilities through extended dialogues, they create new opportunities for studying adversarial conversational behavior in extended interaction settings that traditional single-turn safety evaluations fail to capture. We systematically study these interactional dynamics using a controlled LLM-to-LLM simulation framework for automated red-teaming across bilingual social engineering scenarios. Evaluating eight state-of-the-art models in English and Chinese, we analyze dialogue-level outcomes, annotate attacker and defender strategy families, and model interaction dynamics between them. Results show that multi-turn adversarial dialogues follow recurrent escalation patterns, while defensive responses frequently rely on verification, delay, and channel control.
相关事件查看全部 (2)
相关公司
暂无数据
相关人物
暂无数据
相关产品
暂无数据