REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge arXiv:2603.17145v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1 accuracy), thereby ignoring the ordinal structure inherent in regression tasks; for instance, they fail to recognize t

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge · 相关产品