ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure 文章

ArXiv CS.AI2026-05-29NEWSen作者: A. J. Lew (Unreasonable Labs), Y. Cao (Unreasonable Labs), M. J. Buehler (Unreasonable Labs)

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.AI
作者: A. J. Lew (Unreasonable Labs), Y. Cao (Unreasonable Labs), M. J. Buehler (Unreasonable Labs)
文章类型: NEWS
语言: en
发布日期: 2026-05-29

原文

摘要

arXiv:2605.30284v1 Announce Type: new Abstract: Scientific discovery is an inherently creative and uncertain process, requiring reasoning beyond the recall of known knowledge. While many benchmarks have been proposed to evaluate large language model (LLM) performance on deep research tasks via multi-hop retrieval, their innovative reasoning abilities essential for true scientific discovery remain largely untested. We introduce a benchmark framework for evaluating model performance in scientific discovery and reasoning, building up from a raw problem to the classical null hypothesis test. In our framework, models initially receive only the topic and research question from a recent paper, with technical details progressively revealed.

ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术