SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection arXiv:2605.28030v1 Announce Type: cross Abstract: Fine-tuning large language models often undermines their safety alignment, a problem further amplified by harmful fine-tuning attacks in which adversarial data removes safeguards and induces unsafe behaviors. We propose SPARD, a defense framework that integrates Safety-Projected Alternating optimization with Relevance-Diversity aware data se

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection · 相关公司

V
VanceCOMPANY
A
arXivNONPROFIT
P
PactNONPROFIT
F
FrameworkCOMPANY
L
LoweCOMPANY
A
ACTNONPROFIT
V
VIACOMPANY