UR$^2$: Unify RAG and Reasoning through Reinforcement Learning 文章

ArXiv CS.CL2026-06-03NEWSen作者: Weitao Li, Boran Xiang, Xiaolong Wang, Zhinan Gou, Weizhi Ma, Yang Liu

摘要

arXiv:2508.06165v5 Announce Type: replace Abstract: Large Language Models (LLMs) have shown strong capabilities through two complementary paradigms: Retrieval-Augmented Generation (RAG) for knowledge grounding and Reinforcement Learning from Verifiable Rewards (RLVR) for complex reasoning. However, existing attempts to unify these paradigms remain narrow in scope, typically limited to open-domain QA with fixed retrieval settings, which constrains generalization to broader domains. To address this limitation, we propose UR$^2$ (Unified RAG and Reasoning)), a general reinforcement learning framework that dynamically coordinates retrieval and reasoning. UR$^2$ introduces two key designs: a difficulty-aware curriculum that selectively invokes retrieval only for challenging instances, and a hybrid knowledge access strategy that combines domain-specific offline corpora with on-the-fly LLM-generated summaries.

UR$^2$: Unify RAG and Reasoning through Reinforcement Learning 文章

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (7)