Beyond Scalar Rewards: Dense Feedback for LLM Policy Synthesis in Sequential Social Dilemmas 文章

ArXiv CS.CL2026-06-02NEWSen作者: V\'ictor Gallego

查看原文 →

Beyond Scalar Rewards: Dense Feedback for LLM Policy Synthesis in Sequential Social Dilemmas · 相关人物

暂无数据