Steering Language Models Before They Speak: Logit-Level Interventions 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Steering Language Models Before They Speak: Logit-Level Interventions arXiv:2601.10960v2 Announce Type: replace Abstract: Controllable generation requires language models to realize output characteristics such as reading level, politeness, and toxicity. Existing steering methods are often indirect, require access to internal activations, or depend on auxiliary trained models. We propose SWAI, a training-free inference-time method that addresses these limitations by steering directly in logit sp