The Distributed Detectability Band Against Marginal-Preserving Attacks 事件
PRODUCT_LAUNCH2026-06-10影响: MEDIUM
The Distributed Detectability Band Against Marginal-Preserving Attacks arXiv:2606.10456v1 Announce Type: cross Abstract: AI-control monitors score individual agent actions to detect misbehavior, but real harm can be distributed across many benign-looking steps, each individually below any per-step alarm. We construct a marginal-preserving, correlation-encoded distributed-sabotage attack using a Gaussian-copula AR(1) construction: the per-step monitor-score marginal is held exactly equal to beni
The Distributed Detectability Band Against Marginal-Preserving Attacks · 相关报道
相关报道
The Distributed Detectability Band Against Marginal-Preserving Attacks
ArXiv CS.AI2026-06-10