Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models 文章

ArXiv CS.CL2026-06-03NEWSen作者: Muhammed Shahir Abdurrahman, Hashem Elezabi, Bruce Changlong Xu

摘要

arXiv:2303.15619v2 Announce Type: replace Abstract: The choice of \emph{which} tokens to mask is a central, under-examined design decision in masked language modeling (MLM). Standard pretraining masks tokens uniformly at random, but several studies show that more informative masking targets can improve downstream performance. We study masking as a \emph{task-adaptive} component of the fine-tuning pipeline and introduce \textbf{Typhoon}, a masking strategy that uses the gradient of the task loss with respect to one-hot token inputs to estimate, online, how much each token type contributes to the objective. Typhoon maintains an exponential moving average of per-token-type saliency and calibrates these scores into a masking distribution whose expected masking rate matches a target budget, under a token-independence approximation.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据