Subliminal Learning is a LoRA Artifact 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Subliminal Learning is a LoRA Artifact arXiv:2606.00831v1 Announce Type: new Abstract: Subliminal learning is a phenomenon where language models can transmit behavioral traits to other models through seemingly innocuous data (Cloud et al., 2025). In subliminal learning, a teacher model with a behavioral trait (e.g. obsession with cats) can transmit this cat obsession to a student model finetuned only on numerical sequences generated by the teacher. In this paper, we ask: how does this unexpecte