ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment arXiv:2606.00644v1 Announce Type: new Abstract: AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally controlled benchmark for evaluating whether LLM agents can make such forward-looking research judgements from historical evidence. ForeSci contains 500 tasks across four fast-movin