Beyond String Matching: Semantic Evaluation of PDF Table Extraction 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Beyond String Matching: Semantic Evaluation of PDF Table Extraction arXiv:2603.18652v2 Announce Type: replace Abstract: Reliably extracting tables from PDFs is essential for large-scale scientific data mining and knowledge base construction, yet existing evaluation approaches rely on rule-based metrics that fail to capture semantic equivalence of table content. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground truth, using tables sourced from ar