摘要
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.
相关事件查看全部 (2)
PaperBench: Evaluating AI’s Ability to Replicate AI Research
2025-04-02BREAKTHROUGH影响: HIGH
PaperBench: Evaluating AI’s Ability to Replicate AI Research
2025-04-02PRODUCT_LAUNCH影响: MEDIUM
相关公司
暂无数据
相关人物
暂无数据
相关技术
暂无数据