Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents 文章

ArXiv CS.CV2026-06-05NEWSen作者: AJ Carl P. Dy, Aivin V. Solatorio

摘要

arXiv:2606.06242v1 Announce Type: cross Abstract: Institutional documents contain substantial amounts of operational and analytical information embedded within figures and tables. Current approaches for extracting visual content from documents are largely built around generic document layout analysis, where figures and tables are treated as uniformly relevant document objects rather than semantically meaningful analytical artifacts. In this work, we introduce a benchmark dataset and evaluation framework for \textit{data snapshot extraction}, the task of identifying and localizing semantically meaningful visual artifacts within institutional documents. The benchmark spans humanitarian reports, World Bank policy research working papers, and project appraisal documents, and includes annotations for figures and tables that contain reusable analytical information.

Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents 文章

摘要

相关事件查看全部 (2)

相关公司查看全部 (1)

相关人物

相关产品

相关技术