Retrying vs Resampling in AI Control 事件

Name: Retrying vs Resampling in AI Control
Start: 2026-05-26

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Retrying vs Resampling in AI Control arXiv:2605.26047v1 Announce Type: new Abstract: AI coding scaffolds like Claude Code and Codex use \textit{retrying}: blocking actions flagged as risky and continuing the trajectory. We study retrying from an AI control perspective, which treats the model as potentially adversarial. We find that while retrying reduces honest suspicion scores, the untrusted model can exploit monitor rationale to construct sneakier attacks, negating safety gains. We also study

人工智能

关系图谱

Retrying vs Resampling in AI Control · 相关人物

He Ma

AFE

can

S LI

Sam