ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research 文章

ArXiv CS.AI2026-05-27NEWSen作者: Ruicheng Ao, David Simchi-Levi, Xinshang Wang

摘要

arXiv:2601.21008v3 Announce Type: replace-cross Abstract: Operations Research practitioners debug infeasible models through an iterative process: inspecting Irreducible Infeasible Subsystems ( IIS), identifying constraint conflicts, and repairing formulations until feasibility is restored. Existing LLM benchmarks mostly treat OR as one-shot translation from problem descriptions to solver code, omitting this diagnostic loop. We formalize infeasible-model repair as a solver-in-the-loop Markov Decision Process in which each action triggers solver re-execution and IIS recomputation, yielding deterministic, verifiable feedback. We introduce ORLoopBench, a benchmark suite with two components: OR-Debug-Bench releases 5,362 LP/MILP repair instances, while OR-Bias-Bench evaluates closed-form operational decision rationality across inventory settings. Solver-verified RLVR training enables an 8B model to surpass frontier APIs on LP repair (95.3% vs 92.