详细信息
- 来源站点
- ArXiv CS.AI
- 作者
- Ling Yue, Tingwen Zhang, Jiaying Wang, Zhen Xu, Shaowu Pan
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-05-28
摘要
arXiv:2604.20857v2 Announce Type: replace-cross Abstract: Scientific papers use schematic diagrams to communicate methods, workflows, and system structure, yet existing scientific-figure corpora often mix them with plots, screenshots, and photographs and rarely preserve document context. We introduce DiagramBank, a quality-audited dataset of 57,100 schematic diagrams curated from OpenReview-hosted AI/ML venues. Each record links a diagram image to its paper title, abstract, figure caption, in-text figure-reference spans, venue/year metadata, provenance fields, and filtering labels. DiagramBank is a reusable resource for scientific-document understanding, diagram retrieval, corpus analysis, and future benchmark construction. We describe its extraction and cascade-filtering pipeline, release schema, confidence-controlled views, dataset card, and indexing utilities. A manual blind audit of the released cascade-filtered records estimates 93.