Auto-Discovery-Bench: Diagnosing Structured State Tracking in Oracle-Guided Discovery 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Auto-Discovery-Bench: Diagnosing Structured State Tracking in Oracle-Guided Discovery arXiv:2502.15224v2 Announce Type: replace-cross Abstract: Interactive discovery requires agents to maintain and update structured beliefs over many rounds of feedback. Before evaluating agents in noisy, open-ended scientific environments, it is useful to isolate this prerequisite capability under controlled conditions. We introduce Auto-Discovery-Bench, a deterministic oracle-guided diagnostic benchmark in whi