NSF-SciFy: Mining the NSF Awards Database for Scientific Claims 文章

ArXiv CS.CL2026-05-27NEWSen作者: Delip Rao, Weiqiu You, Eric Wong, Chris Callison-Burch

摘要

arXiv:2503.08600v3 Announce Type: replace Abstract: We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts. While previous scientific claim verification datasets have been limited in size and scope, NSF-SciFy represents a significant advance with 2.8 million claims from 400,000 abstracts spanning all science and mathematics disciplines. We present two focused subsets: NSF-SciFy-MatSci with 114,000 claims from materials science awards, and NSF-SciFy-20K with 135,000 claims across five NSF directorates. Using zero-shot prompting, we develop a scalable approach for joint extraction of scientific claims and investigation proposals. We demonstrate the dataset's utility through three downstream tasks: non-technical abstract generation, claim extraction, and investigation proposal extraction.