A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting 文章

ArXiv CS.AI2026-06-03NEWSen作者: Peiyan Zhang

摘要

arXiv:2606.03280v1 Announce Type: new Abstract: Recent work shows that language models can transmit behavioural traits through hidden signals in generated data during training. We ask whether a more direct and stricter channel is also viable: can one language model communicate useful intermediate reasoning state to another at inference time by translating and injecting hidden activations, rather than by passing natural-language text? We test this question in a controlled Pythia-160M to Pythia-410M multi-hop reasoning setting. A linear translation layer learns a strong normalized-space map between sender and receiver hidden states, with normalized cosine similarity near 0.97 across seeds. However, when the translated activations are injected into the receiver at inference time, they do not improve downstream answering. Low-strength additive injection remains near the no-injection baseline, with confidence intervals that cross zero.

A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting 文章

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (6)

相关技术查看全部 (1)