Voting with the Graph: Stable RLAIF via Topological Consistency Maximization 事件
BREAKTHROUGH2026-05-26影响: HIGH
Voting with the Graph: Stable RLAIF via Topological Consistency Maximization arXiv:2510.15514v3 Announce Type: replace Abstract: Reinforcement Learning from AI Feedback (RLAIF) relies on LLM judges as preference measurement instruments, yet these instruments are fundamentally limited by random measurement errors -- stochastic fluctuations that manifest as preference cycles (e.g., $A \succ B \succ C \succ A$), occurring in 5-9% of evaluations across state-of-the-art models. While repeated sampl
相关产品查看全部 (10)
相关报道查看全部 (1)
Voting with the Graph: Stable RLAIF via Topological Consistency Maximization
ArXiv CS.AI2026-05-26