Scaling laws for reward model overoptimization 文章

OpenAI Blog2022-10-19BLOGen