Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards 文章

ArXiv CS.CL2026-06-08NEWSen作者: Shihao Zhang, Xiaoman Wang, Yuan Liu, Yunshi Lan, Weining Qian

摘要

arXiv:2606.06825v1 Announce Type: new Abstract: Reinforcement learning has recently shown promise in improving large language models for Text-to-SQL generation, yet existing methods typically optimize one-shot rewards defined over a single SQL state. Such rewards provide limited guidance for iterative SQL correction and are insufficient to capture the improvement of multi-turn SQL refinement. In this paper, we propose Progress-SQL, a multi-turn reinforcement learning framework with progressive rewards for Text-to-SQL. Our approach introduces an Oracle-guided Diagnostic Tree (ODT), which abstracts SQL queries into clause-level structural profiles and produces diagnostic feedback for next-turn refinement. To provide dense and robust reward signals, we combine ODT-based structural alignment with lexical alignment and define a progressive reward that measures the improvement from the initial SQL to the final SQL.