DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models 文章

ArXiv CS.CL2026-05-29NEWSen作者: Shuai Wang, Yu Yin, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

摘要

arXiv:2605.07210v2 Announce Type: replace-cross Abstract: This paper shows how diffusion language models (DLMs) can be used as effective and efficient retrievers. Existing DLM-based retrievers (e.g., DiffEmbed) follow BERT-style encoding, representing each query or passage as a single mean-pooled vector. This ignores how DLMs are trained to generate responses through masked-position prediction under bidirectional attention, a capability that can provide stronger retrieval signals. We propose DiffRetriever, which uses the DLM's native masked-position prediction directly for retrieval. For each query or passage, DiffRetriever appends one or more masked positions, using the outputs as retrieval representations in a single forward pass. With one masked position, single-representation DiffRetriever already improves over DiffEmbed on the same backbones.

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (6)

相关技术查看全部 (3)