Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models 文章

ArXiv CS.CL2026-06-03NEWSen作者: Muhammed Shahir Abdurrahman, Hashem Elezabi, Bruce Changlong Xu

摘要

arXiv:2303.15619v2 Announce Type: replace Abstract: The choice of \emph{which} tokens to mask is a central, under-examined design decision in masked language modeling (MLM). Standard pretraining masks tokens uniformly at random, but several studies show that more informative masking targets can improve downstream performance. We study masking as a \emph{task-adaptive} component of the fine-tuning pipeline and introduce \textbf{Typhoon}, a masking strategy that uses the gradient of the task loss with respect to one-hot token inputs to estimate, online, how much each token type contributes to the objective. Typhoon maintains an exponential moving average of per-token-type saliency and calibrates these scores into a masking distribution whose expected masking rate matches a target budget, under a token-independence approximation.

Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)