PaperBench: Evaluating AI’s Ability to Replicate AI Research 事件

PRODUCT_LAUNCH2025-04-02影响: MEDIUM

PaperBench: Evaluating AI’s Ability to Replicate AI Research We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.