ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning arXiv:2605.23454v2 Announce Type: replace Abstract: Rubric-based rewards offer a promising way to extend reinforcement learning (RL) for large language models beyond tasks with automatically verifiable answers. However, scaling rubric-based RL remains challenging: existing approaches often rely on expert-written rubrics and manually constructed question sets, while fixed task-level rubrics may fail to capture the evaluatio