AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety 文章

ArXiv CS.AI2026-06-04NEWSen作者: Yanjing Ren, Reza Ebrahimi, TengTeng Ma

摘要

arXiv:2606.04867v1 Announce Type: new Abstract: As AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark dataset of human-AI companion conversations annotated with fine-grained safety risk categories. The dataset contains 2,123 real-world Replika conversations collected from Reddit and annotated through human-AI collaboration across nine categories: sexual behavior, antisocial behavior, physical aggression, verbal aggression, substance abuse, self-harm and suicide, control, manipulation, and no-harm. Using this benchmark, we evaluate 20 state-of-the-art open-source and closed-source LLMs under an LLM-as-judge framework for detecting unsafe interactions.

相关人物

暂无数据

相关技术

暂无数据