On the impact of retrieved content representations in RAG Pipelines 文章

ArXiv CS.CL2026-06-01NEWSen作者: Jonathan J Ross, Bevan Koopman, Anton van der Vegt, Guido Zuccon

摘要

arXiv:2605.30790v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) supplements a language model's input with retrieved documents, yet most RAG pipelines inherit retrieval components designed for human readers. How retrieved content should be represented when the consumer is a large language model (LLM) rather than a human is less well understood. Recent work has proposed transformations of retrieved content and identified properties that affect generation, but each examines a single transformation or property in isolation, leaving open which features of a document's representation matter most. We address this with a controlled comparison: holding retrieval fixed, we vary only the representation of retrieved documents, comparing an original baseline against thirteen transformations spanning selection, summarisation, and reformulation, in query-dependent and query-independent variants.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据