Token-weighted Direct Preference Optimization with Attention 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Token-weighted Direct Preference Optimization with Attention arXiv:2605.21883v2 Announce Type: replace Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existing token-level PO methods compute the token weights using either token-position-based heuristic functions or probability estimates giv