Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study 文章

ArXiv CS.CL2026-06-01NEWSen作者: Xiaonan Xu, Wenjing Wu

摘要

arXiv:2605.31408v1 Announce Type: new Abstract: Skill documents provide procedural knowledge to large-language-model agents at inference time. This article studies whether the presentation granularity of controlled skill knowledge changes downstream task success. The experiment uses a pinned SkillsBench version, a 30-task domain-balanced subset validated by official oracle runs, two reasoning-enabled model configurations, six skill conditions, and five trials per task-condition-model cell. Skill availability is the clearest empirical signal. Relative to no skill, skill conditions increase task-mean pass rate by 26.7 to 36.0 percentage points for GPT-5.5 and by 18.0 to 26.0 percentage points for DeepSeek V4-Flash. The final data contain 1,800 rows, with 900 rows for each model. The task is the inference unit. Five trials are aggregated within each task-condition-model cell before paired contrasts are estimated over 30 tasks. The primary presentation contrasts are smaller and uncertain.

Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (6)

相关技术