Inference-Free Multimodal Learned Sparse Retrieval for Production-Scale Visual Document Search 文章

ArXiv CS.CV2026-06-01NEWSen作者: Gyu-Hwung Cho (NAVER Corp., Republic of Korea, Seoul National University, Republic of Korea), Youngjune Lee (NAVER Corp., Republic of Korea), Kiyoon Jeong (NAVER Corp., Republic of Korea), Siyoung Lee (NAVER Corp., Republic of Korea), Sanggyu Han (NAVER Corp., Republic of Korea), Herv\'e Dejean (Naver Labs Europe, France), St\'ephane Clinchant (Naver Labs Europe, France), Seung-won Hwang (Seoul National University, Republic of Korea)

摘要

arXiv:2605.30917v1 Announce Type: cross Abstract: As large-scale visual-document corpora such as arXiv papers and enterprise PDFs continue to grow, visual-document retrieval has gained increasing attention; yet it still lacks a deployable system that lexically indexes visual documents to serve queries without neural encoding at scale. Existing methods either achieve strong retrieval quality with VLM-based dense or multi-vector models but require neural query encoding at serving time, or avoid query encoding with OCR- or caption-based BM25 at the cost of time-consuming text extraction or generation. To fill this missing serving regime, we present V-SPLADE, an inference-free sparse retriever for visual-document retrieval. However, such inference-free multimodal learned sparse retrieval systems remain underexplored and have not yet shown dense-level effectiveness under high sparsity.

相关公司

暂无数据

相关人物

暂无数据