PaperBench: Evaluating AI’s Ability to Replicate AI Research 文章

OpenAI Blog2025-04-02BLOGen

摘要

We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据