SPAR: Support-Preserving Action Rectification 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
SPAR: Support-Preserving Action Rectification arXiv:2605.27877v1 Announce Type: cross Abstract: Offline policy improvement faces an inherent conflict between maximizing value and fitting the data distribution. While in-sample weighted regression is stable, it suffers from over-conservatism that suppresses high-value actions in the distribution tail; conversely, gradient-based approaches often exhibit a fitting-optimization conflict of gradients, which drives the policy off the data manifold. To
相关产品查看全部 (10)
相关报道查看全部 (1)
SPAR: Support-Preserving Action Rectification
ArXiv CS.AI2026-05-28