Subliminal Learning Is Steering Vector Distillation 事件
OPEN_SOURCE2026-06-02影响: MEDIUM
Subliminal Learning Is Steering Vector Distillation arXiv:2606.00995v1 Announce Type: new Abstract: Subliminal learning refers to a student language model acquiring a teacher's traits (e.g. a system-prompted preference for owls) when fine-tuned on the teacher's outputs, despite the outputs being semantically unrelated to those traits. It remains poorly understood how data without semantic meaning can transfer specific semantic traits. In this work, we show that subliminal learning is mediated b