When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks? 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks? arXiv:2605.30719v1 Announce Type: cross Abstract: We study when large language models (LLMs) can serve as effective black-box policy optimizers for reinforcement learning (RL) tasks, i.e., when can we replace classical RL algorithms with an LLM? We explore this question by introducing Prompted Policy Optimization (PromptPO), an iterative method that prompts an LLM with Python descriptions of the state space, action space, and r