Token-weighted Direct Preference Optimization with Attention 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Token-weighted Direct Preference Optimization with Attention arXiv:2605.21883v2 Announce Type: replace Abstract: Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existing token-level PO methods compute the token weights using either token-position-based heuristic functions or probability estimates giv
Token-weighted Direct Preference Optimization with Attention · 相关报道
相关报道
Token-weighted Direct Preference Optimization with Attention
ArXiv CS.CL2026-05-27