DTBench: A Synthetic Benchmark for Document-to-Table Extraction 文章

ArXiv CS.AI2026-06-01NEWSen作者: Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, Yunjun Gao

摘要

arXiv:2602.13812v3 Announce Type: replace-cross Abstract: Document-to-table (Doc2Table) extraction derives structured tables from unstructured documents under a target schema, enabling reliable and verifiable SQL-based data analytics. Although large language models (LLMs) have shown promise in flexible information extraction, their ability to produce precisely structured tables remains insufficiently understood, particularly for indirect extraction that requires complex capabilities such as reasoning and conflict resolution. Existing benchmarks neither explicitly distinguish nor comprehensively cover the diverse capabilities required in Doc2Table extraction. We argue that a capability-aware benchmark is essential for systematic evaluation. However, constructing such benchmarks using human-annotated document-table pairs is costly, difficult to scale, and limited in capability coverage.

DTBench: A Synthetic Benchmark for Document-to-Table Extraction 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术