Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models 文章

ArXiv CS.CL2026-06-01NEWSen作者: Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

摘要

arXiv:2605.31550v1 Announce Type: new Abstract: Table question answering requires models to recover semantic relations encoded implicitly by two-dimensional layout, merged cells, and hierarchical headers. Current pipelines typically use HTML or Markdown as intermediate table representations, but these layout-oriented serializations introduce markup overhead and require large language models to infer header-cell alignments from row and column spans. We propose Semantic Triplet Restoration (STR), a protocol that rewrites each cell as an atomic fact , where the item path specifies the row-wise entity, the feature path specifies the hierarchical attribute, and the value contains the cell content. We also present TripletQL, a lightweight query-aware router that uses STR to select an appropriate rendering or filtered subset of triplets for each question. Across four Chinese and English table-QA benchmarks, STR matches or improves upon HTML-based baselines while reducing input tokens.