DiagramBank: A Quality-Audited Dataset of Scientific Schematic Diagrams with Multi-Level Document Context 文章

ArXiv CS.AI2026-05-28NEWSen作者: Ling Yue, Tingwen Zhang, Jiaying Wang, Zhen Xu, Shaowu Pan

详细信息

来源站点
ArXiv CS.AI
作者
Ling Yue, Tingwen Zhang, Jiaying Wang, Zhen Xu, Shaowu Pan
文章类型
NEWS
语言
en
发布日期
2026-05-28

摘要

arXiv:2604.20857v2 Announce Type: replace-cross Abstract: Scientific papers use schematic diagrams to communicate methods, workflows, and system structure, yet existing scientific-figure corpora often mix them with plots, screenshots, and photographs and rarely preserve document context. We introduce DiagramBank, a quality-audited dataset of 57,100 schematic diagrams curated from OpenReview-hosted AI/ML venues. Each record links a diagram image to its paper title, abstract, figure caption, in-text figure-reference spans, venue/year metadata, provenance fields, and filtering labels. DiagramBank is a reusable resource for scientific-document understanding, diagram retrieval, corpus analysis, and future benchmark construction. We describe its extraction and cascade-filtering pipeline, release schema, confidence-controlled views, dataset card, and indexing utilities. A manual blind audit of the released cascade-filtered records estimates 93.

相关事件

暂无数据

相关公司查看全部 (1)

O
OpenReviewCOMPANY

相关人物

暂无数据

相关技术

暂无数据