Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction 文章

ArXiv CS.AI2026-05-29NEWSen作者: Xingguo Chen, Yuchen Shen, Shangdong Yang, Chao Li, Guang Yang, Wenhao Wang

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction · 相关事件