DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models 文章

ArXiv CS.CL2026-05-29NEWSen作者: Shuai Wang, Yu Yin, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

摘要

arXiv:2605.07210v2 Announce Type: replace-cross Abstract: This paper shows how diffusion language models (DLMs) can be used as effective and efficient retrievers. Existing DLM-based retrievers (e.g., DiffEmbed) follow BERT-style encoding, representing each query or passage as a single mean-pooled vector. This ignores how DLMs are trained to generate responses through masked-position prediction under bidirectional attention, a capability that can provide stronger retrieval signals. We propose DiffRetriever, which uses the DLM's native masked-position prediction directly for retrieval. For each query or passage, DiffRetriever appends one or more masked positions, using the outputs as retrieval representations in a single forward pass. With one masked position, single-representation DiffRetriever already improves over DiffEmbed on the same backbones.

相关事件查看全部 (1)

提出DiffRetriever方法
BREAKTHROUGH影响: medium

相关公司

暂无数据

相关人物

暂无数据