摘要
arXiv:2505.16014v5 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) systems deployed in sensitive domains must provide interpretable evidence selection and robust safeguards against data poisoning, yet current approaches rely on opaque similarity-based retrieval with arbitrary top-k cutoffs that offer no explanation for their selections and remain vulnerable to adversarial manipulation. METEORA replaces re-ranking with rationale-driven selection via three components: a DPO-tuned LLM that generates explicit retrieval rationales, an Evidence Chunk Selection Engine (ECSE) that uses those rationales with statistical elbow detection for adaptive cutoff determination, and a Verifier LLM that filters poisoned evidence using the same rationales. Across six datasets, METEORA achieves 13.41% higher recall, 21.05% higher precision (without expansion), an 80% reduction in evidence volume, a 33.34% improvement in answer accuracy, and a 4.