SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents arXiv:2606.02302v1 Announce Type: cross Abstract: Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited