Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion 文章

ArXiv CS.AI2026-05-26NEWSen作者: Gianluca Sabatini, Chenhao Li, Marco Hutter

摘要

arXiv:2605.24975v1 Announce Type: cross Abstract: Proximal Policy Optimization (PPO) has become the de facto standard for training legged robots, thanks to its robustness and scalability in massively parallel simulation environments like IsaacLab. However, its on-policy nature makes it inherently sample-inefficient, preventing its use for continuous adaptation and fine-tuning on real hardware. Soft Actor-Critic (SAC), by contrast, is an off-policy algorithm that can reuse past experience, making it a natural candidate for sim-to-real transfer workflows where the same algorithm can be used both in simulation and for online learning on the real robot. Despite these advantages, SAC has consistently failed to match PPO's empirical performance in massively parallel training settings.