Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models 文章

ArXiv CS.CL2026-06-02NEWSen作者: Chungpa Lee, Jy-yong Sohn, Kangwook Lee

摘要

arXiv:2602.23197v2 Announce Type: replace Abstract: Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning.

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (6)