REPOT: Recoverable Program-of-Thought via Checkpoint Repair 文章

ArXiv CS.CL2026-05-29NEWSen作者: Parsa Mazaheri

摘要

arXiv:2605.30052v1 Announce Type: cross Abstract: One-shot Program-of-Thought (PoT) emits a Python program that prints a primitive-action plan; a single invalid action silently invalidates the trajectory. We introduce RePoT (Recoverable PoT): a deterministic verified replay that walks the plan through the environment to its first invalid transition, then one LLM call that resumes from the verified prefix. RePoT costs at most one extra LLM call on the ~14% of problems where PoT fails. RePoT beats PoT by +3 to +11pp across four closed-model configurations on PuzzleZoo-775 and peaks at 96.9% vs 86.3% on gpt-5.4-mini-medium; against the matched-budget PoT-retry baseline, RePoT wins decisively on Gemini (+3.8pp, 95% CI [+2.2,+5.

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据