Rescaling MLM-Head for Neural Sparse Retrieval 文章

ArXiv CS.AI2026-06-18NEWSen作者: Youngjoon Jang, Seongtae Hong, Jonah Turner, Heuiseok Lim

详细信息

来源站点: ArXiv CS.AI
作者: Youngjoon Jang, Seongtae Hong, Jonah Turner, Heuiseok Lim
文章类型: NEWS
语言: en
发布日期: 2026-06-18

摘要

arXiv:2606.18811v1 Announce Type: cross Abstract: Learned sparse retrieval (LSR) models such as SPLADE have traditionally used BERT-style masked language models as backbone encoders. A natural expectation is that replacing BERT with stronger pretrained encoders should improve retrieval effectiveness. However, we find that under standard SPLADE training recipes, backbones with large MLM-head L2 norms can suffer performance degradation and even training collapse under standard SPLADE training recipes. We identify this failure as a scale mismatch in the MLM head: SPLADE directly uses MLM-head outputs to construct sparse lexical representations, and query-document relevance is computed by an unnormalized dot product over these representations. As a result, an inflated MLM-head scale can amplify sparse activations, distort matching scores, and destabilize contrastive training under common training settings.

Rescaling MLM-Head for Neural Sparse Retrieval 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (2)