DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning arXiv:2606.00160v1 Announce Type: cross Abstract: Large language models (LLMs) suffer from degraded safety capabilities even when fine-tuned with benign datasets. However, existing methods for identifying safety-degrading samples in benign datasets suffer from high computational costs and significant noise issues. In this paper, we propose DataShield to efficiently and effectively identify potential safety-degrad
相关产品查看全部 (10)
相关报道查看全部 (1)
DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning
ArXiv CS.CL2026-06-02