Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams 文章

ArXiv CS.CV2026-06-03NEWSen作者: Abhishek Kumar, Isha Motiyani, Tilak Kasturi, Ethan Seefried, Prahitha Movva, Tirthankar Ghosal

查看原文 →

关系图谱

摘要

arXiv:2606.03410v1 Announce Type: new Abstract: Engineering diagrams pose a distinct challenge for vision-language models: unlike natural images or general documents, they encode information through dense spatial layouts, domain-specific symbols, and cross-references between visual callouts and structured parts tables. Despite their centrality to service, repair, and design workflows, there is no public benchmark for measuring VLM capabilities in this domain; existing datasets primarily focus on flowcharts, scientific figures, or business documents. To address this gap, we introduce Enginuity, the first open dataset and benchmark for evaluating VLMs on complex engineering diagrams. We define two tasks over a corpus of U.S. military service and repair manuals: structured parts-table extraction (Task 1) and free-form visual diagram question answering (VQA)(Task 2) for benchmarking. We evaluate four frontier VLMs (GPT-5.2 Chat, Claude Opus 4.

Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams 文章

摘要

相关事件

相关公司查看全部 (1)

相关人物

相关产品查看全部 (3)

相关技术查看全部 (1)