RealityTest: How People Probe AI Identity and Whether Models Disclose It 文章

ArXiv CS.CL2026-06-02NEWSen作者: Anna Gausen, Sarenne Wallbridge, Bessie O'Dell, Christopher Summerfield, Hannah Rose Kirk

摘要

arXiv:2606.00168v1 Announce Type: new Abstract: AI systems are increasingly deployed in conversational settings where users may be uncertain whether they are speaking with a human or an AI. Despite mounting regulatory attention to this known safety risk, existing evaluations of AI disclosure are typically English-only, based on machine-generated questions, and restricted to text. We present RealityTest to comprehensively test whether AI systems disclose their identity when asked. The benchmark is the first large-scale multimodal and multilingual evaluation, grounded in human data on how people actually encounter and question AI identity in the real-world. Alongside the benchmark, we release the underlying dataset of 3,152 identity-probing queries collected from ~750 participants across 49 countries and five languages, in text and speech scenarios.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据