Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training arXiv:2605.11134v2 Announce Type: replace-cross Abstract: Preference learning methods like Direct Preference Optimization (DPO) are known to induce reliance on spurious correlations, leading to sycophancy and length bias in today's language models and potentially severe goal misgeneralization in future systems. In this work, we provide a unified theoretical analysis of this phenom