SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills 文章

ArXiv CS.AI2026-05-27NEWSen作者: Yinghan Hou, Zongyou Yang, Zaihu Pang, Xiujun Ma

摘要

arXiv:2604.06550v2 Announce Type: replace-cross Abstract: OpenClaw's ClawHub marketplace hosts tens of thousands of community-contributed agent skills (49,592 in our 2026-04-04 snapshot), and recent audits report that 13-26% contain security vulnerabilities. Regex scanners miss obfuscated payloads; formal static analyzers cannot read the natural-language SKILL.md instructions that hide prompt injection and social engineering. Neither approach covers both modalities. SkillSieve is a three-layer detection framework that applies deeper analysis only where needed. Layer 1 runs regex, AST, and metadata checks through a recall-tuned heuristic scorer, filtering 86% of the volume. Layer 2 routes suspicious skills to an LLM, splitting the analysis into four parallel sub-tasks with structured outputs. Layer 3 puts high-risk skills before a jury of three LLMs that vote independently and debate when they disagree.

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills 文章

摘要

相关事件查看全部 (2)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (12)

相关技术查看全部 (21)