Scaling Multi-Hop Training Data via Graph-Constrained Path Selection 文章

ArXiv CS.CL2026-06-01NEWSen作者: Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song, Wei Xue, Yike Guo

摘要

arXiv:2605.31238v1 Announce Type: new Abstract: Endowing large language models with compositional reasoning over specialized documents requires multi-hop training data at scale, where such data rarely exists outside of curated benchmarks built on structured sources. To construct it directly from plain, unannotated text, existing methods ask a single teacher model to jointly discover an evidence path through a document and verbalize it as a question-answer pair. However, these methods degrade sharply when documents are structured around repetitive templates and densely cross-referencing clauses, conditions that characterize most real-world specialized corpora. In this work, we decouple the two operations: reasoning paths are enumerated offline over a graph of contextual keyword centroids, and the teacher is invoked only to verbalize pre-validated paths.

Scaling Multi-Hop Training Data via Graph-Constrained Path Selection 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术