Inference-Free Multimodal Learned Sparse Retrieval for Production-Scale Visual Document Search 文章

ArXiv CS.CV2026-06-01NEWSen作者: Gyu-Hwung Cho (NAVER Corp., Republic of Korea, Seoul National University, Republic of Korea), Youngjune Lee (NAVER Corp., Republic of Korea), Kiyoon Jeong (NAVER Corp., Republic of Korea), Siyoung Lee (NAVER Corp., Republic of Korea), Sanggyu Han (NAVER Corp., Republic of Korea), Herv\'e Dejean (Naver Labs Europe, France), St\'ephane Clinchant (Naver Labs Europe, France), Seung-won Hwang (Seoul National University, Republic of Korea)

查看原文 →

关系图谱

摘要

arXiv:2605.30917v1 Announce Type: cross Abstract: As large-scale visual-document corpora such as arXiv papers and enterprise PDFs continue to grow, visual-document retrieval has gained increasing attention; yet it still lacks a deployable system that lexically indexes visual documents to serve queries without neural encoding at scale. Existing methods either achieve strong retrieval quality with VLM-based dense or multi-vector models but require neural query encoding at serving time, or avoid query encoding with OCR- or caption-based BM25 at the cost of time-consuming text extraction or generation. To fill this missing serving regime, we present V-SPLADE, an inference-free sparse retriever for visual-document retrieval. However, such inference-free multimodal learned sparse retrieval systems remain underexplored and have not yet shown dense-level effectiveness under high sparsity.

Inference-Free Multimodal Learned Sparse Retrieval for Production-Scale Visual Document Search 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (3)