Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
Done, But Not Sure: Disentangling World Completion from Self-Termination in Embodied Agents arXiv:2605.08747v4 Announce Type: replace Abstract: Standard embodied evaluations do not independently score whether an agent correctly commits to task completion at episode closure, a capacity we call terminal commitment. Behaviorally distinct failures--never completing the task, completing it but failing to stop, and reporting success without sufficient evidence--collapse into the same benchmark failur