ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate 文章

ArXiv CS.AI2026-06-02NEWSen作者: Rodney Lafuente-Mercado

摘要

arXiv:2606.00257v1 Announce Type: cross Abstract: Token-level credit assignment for language-model reinforcement learning is usually formulated as if the policy were fully trainable, while practical LLM-RL pipelines often rely on parameter-efficient fine-tuning, especially LoRA. We argue that this separation hides a structural failure mode. Under LoRA, the policy is restricted to a low-rank neighborhood of the reference model, so the per-token output-distribution differences used by common intrinsic credit signals, surprisal, entropy reduction, and policy divergence, can become degenerate after within-trajectory normalization, either approaching uniform weights or concentrating on a small set of task-agnostic positions. We formalize this behavior and propose measuring it directly with concentration diagnostics such as weight Gini and effective-token ratio.

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)