Entropy-aware Masking for Masked Language Modeling 文章

ArXiv CS.CL2026-05-28NEWSen作者: Gokul Srinivasagan, Kai Hartung, Munir Georges

摘要

arXiv:2605.28526v1 Announce Type: cross Abstract: Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding context. This process enables the model to capture both syntactic and semantic properties of language. Conventionally, the tokens selected for masking are chosen at random, which may not always yield the most effective learning signals. In this work, we examine a token masking strategy based on entropy distribution. We use the model's entropy over token predictions to identify which tokens should be masked. This method aims to target tokens that are more informative and uncertain to improve the training efficacy. We also propose a novel self-masking approach that enhances training efficiency without relying on an external reference model.

相关事件查看全部 (1)

Entropy-aware Masking for Masked Language Modeling
2026-05-28PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据