A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting 文章

ArXiv CS.AI2026-06-03NEWSen作者: Peiyan Zhang

摘要

arXiv:2606.03280v1 Announce Type: new Abstract: Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training. We ask whether a more direct and stricter channel is also viable: can one language model communicate useful intermediate reasoning state to another at inference time by translating and injecting hidden activations, rather than by passing natural-language text? We test this question in a controlled Pythia-160M to Pythia-410M multi-hop reasoning setting. A linear translation layer learns a strong normalized-space map between sender and receiver hidden states, with normalized cosine similarity near 0.97 across seeds. However, when the translated activations are injected into the receiver at inference time, they do not improve downstream answering. Low-strength additive injection remains near the no-injection baseline, with confidence intervals that cross zero.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据