Curriculum Learning for Safety Alignment 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Curriculum Learning for Safety Alignment arXiv:2605.26315v1 Announce Type: cross Abstract: Direct Preference Optimisation (DPO) is widely used for safety alignment in large language models. However, prior work shows it is brittle and exhibits poor out-of-distribution (OOD) generalisation. In this paper, we investigate whether Curriculum Learning can improve the robustness of DPO-based safety alignment. We propose Staged-Competence, a curriculum-based framework that organises preference data by

Curriculum Learning for Safety Alignment · 相关公司

A
arXivNONPROFIT
I
ISESNONPROFIT
I
IRECNONPROFIT
F
FrameworkCOMPANY
E
EARNNONPROFIT
A
AnisNONPROFIT
A
ACTNONPROFIT
R
RatioRESEARCH_INSTITUTE
N
nearCOMPANY