Training Deliberative Monitors for Black-Box Scheming Detection 文章

ArXiv CS.CL2026-05-29NEWSen作者: Aditya Sinha, Akshat Naik, Victor Gillioz, Simon Storf, Kilian Merkelbach, Rich Barton-Cooper, Axel H{\o}jmark, Marius Hobbhahn

Training Deliberative Monitors for Black-Box Scheming Detection · 相关技术