Monitoring Agentic Systems Before They're Reliable 文章

ArXiv CS.AI2026-06-02NEWSen作者: Marisa Ferrara Boston, Glen Hanson, Effi Georgala, JD Hudgens, Heather Frase

摘要

arXiv:2606.02494v1 Announce Type: cross Abstract: Agentic systems entering production typically operate as partially integrated assemblies where structural defects, not task-level errors, dominate the failure landscape. At this maturity level, task-level error detection may be infeasible: structural failure modes mask the signal that task-level monitors are designed to detect.We present a monitoring and triage methodology that decomposes agentic system evaluation into three dimensions (quality, suitability, efficiency) at three monitoring scopes (within-run, cross-run, structural), using variance as a characterization signal. Findings are routed through severity classification adapted from FMEA, concentrating human attention on the subset that warrants investigation. We evaluate on a synthetic testbed of 220 runs across 120 document bundles with controlled error injection.Three results emerge.

相关事件查看全部 (1)

Monitoring Agentic Systems Before They're Reliable
2026-06-02PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据