Reinforcement learning with prediction-based rewards 文章

OpenAI Blog2018-10-31BLOGen

摘要

We’ve developed Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time exceeds average human performance on Montezuma’s Revenge.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据