Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors 文章

ArXiv CS.CL2026-06-03NEWSen作者: Francisco Valentini, Edgar Altszyler, Martin Fajcik

详细信息

来源站点: ArXiv CS.CL
作者: Francisco Valentini, Edgar Altszyler, Martin Fajcik
文章类型: NEWS
语言: en
发布日期: 2026-06-03

摘要

arXiv:2606.02814v1 Announce Type: cross Abstract: Neural retrievers are trained to estimate query-document relevance from annotated query-document pairs. Yet annotation protocols may not purely reflect relevance: they select only a subset of documents for labeling, and this selection can favor certain document types over others. We investigate whether supervised bi-encoder retrievers implicitly learn a document-level relevance prior: a query-independent signal encoded in their representation space as a side effect of training on annotated data. We estimate this prior by training simple classifiers on frozen document embeddings and evaluate three state-of-the-art retrievers across multiple IR benchmarks. We find that supervised neural retrievers encode relevance priors that generalize to unseen documents and are consistent across models. These priors create a findability gap: documents with lower prior are systematically harder to retrieve, even when genuinely relevant.

Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术