Gram: Assessing sabotage propensities via automated alignment auditing 文章

ArXiv CS.AI2026-05-29NEWSen作者: David Lindner, Victoria Krakovna, Sebastian Farquhar

Gram: Assessing sabotage propensities via automated alignment auditing · 相关技术

暂无数据