MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution 文章

ArXiv CS.CL2026-06-02NEWSen作者: Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov

摘要

arXiv:2606.00017v1 Announce Type: cross Abstract: Training language model agents for multi-agent strategic interaction presents a core difficulty: the quality of any action may depend on future events that never materialize, on moves that violate game rules, or on decisions made by other players. Standard reinforcement learning assumes that rewards can be assigned at each step, but this assumption fails in settings where outcomes are entangled across time and agents. We introduce delayed per-step reward attribution with eligibility gating, an episode lifecycle and postprocessing pipeline that computes rewards only at episode end, propagates them back to originating steps according to task-specific semantics, and excludes steps that lack valid dependent information from training.

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术