QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples 文章

ArXiv CS.CL2026-06-04NEWSen作者: Mengao Zhang, Xiang Yang, Chang Liu, Tianhui Tan, Ke-wei Huang

摘要

arXiv:2606.04646v1 Announce Type: new Abstract: Many real-world questions over business, legal, and scientific corpora are natural-language versions of database-style queries over records latent in text. Existing retrieval-augmented generation (RAG) systems are optimized primarily for semantic relevance, but retrieving plausible passages does not guarantee correct query execution. We introduce QO-Bench, a diagnostic benchmark for query-operator question answering over typed event tuples. The benchmark covers 22,984 news articles and 614 corporate events across 18 query templates, evaluated on 785 questions. Each gold answer is deterministically computed from typed event tuples and scored by recall, with answers matched to the gold tuples by exact match rather than an LLM judge. This design enables operator-level diagnosis such as joins and intersection.

QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (1)