SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills 文章

ArXiv CS.AI2026-05-26NEWSen作者: Yingtie Lei, Zhongwei Wan, Jiankun Zhang, Samiul Alam, Zixuan Zhong, Peizhou Huang, Xin Wang, Jingxuan Zhang, Donghao Zhou, Yunta Hsieh, Zhihao Dou, Hui Shen, Yan Xu, Dimitrios Dimitriadis, Tuo Zhang, Mi Zhang

查看原文 →

关系图谱

摘要

arXiv:2605.24117v1 Announce Type: new Abstract: Large language model (LLM) agents accumulate rich episodic trajectories while solving real-world tasks, but it remains unclear whether such experience can be distilled into reusable procedural skills. We introduce SkillEvolBench, a diagnostic benchmark for evaluating this step from experience reuse to skill formation. It contains 180 tasks across six real-world agent environments, organized into role-conditioned task families with shared latent procedures. Agents learn from acquisition tasks, update an external skill library using compacted trajectories and verifier feedback, and then face frozen deployment tasks testing context shift, adversarial shortcuts, and composition. By comparing self-generated and curated-start skill evolution against no-skill and raw-trajectory controls, SkillEvolBench separates procedural abstraction from base capability, curated prior knowledge, and direct reuse of episodic traces.

SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills 文章

摘要

相关事件查看全部 (3)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (29)