ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment 文章

ArXiv CS.AI2026-06-02NEWSen作者: Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin, Youyong Kong

详细信息

来源站点
ArXiv CS.AI
作者
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin, Youyong Kong
文章类型
NEWS
语言
en
发布日期
2026-06-02

摘要

arXiv:2606.00644v1 Announce Type: new Abstract: AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally controlled benchmark for evaluating whether LLM agents can make such forward-looking research judgements from historical evidence. ForeSci contains 500 tasks across four fast-moving AI domains and four decision families. Each task is paired with a cutoff-aligned offline knowledge base; post-cutoff papers are hidden during generation and used only for validation. To avoid random future-event prediction, tasks are derived from pre-cutoff taxonomy branches and evidence signals, and answer-generation backbones are selected to precede the task cutoffs. We evaluate native LLMs, Hybrid RAG, and three research-agent adaptations across four backbones.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据