Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies arXiv:2605.29384v1 Announce Type: cross Abstract: We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on frozen retrievers, Sparse Autoencoders without any retrieval-specific adjustments extract a latent vocabulary with approximately