Chunking German Legal Code 文章

ArXiv CS.CL2026-06-01NEWSen作者: Max Prior, Natalia Milanova, Andreas Schultz

摘要

arXiv:2605.19806v2 Announce Type: replace Abstract: This paper investigates chunking strategies for retrieval-augmented generation on German statutory law, using the German Civil Code as a structured benchmark corpus. We implement and compare a range of segmentation approaches, including structural units (sections, subsections, sentences, propositions), fixed-size windows, contextual chunking, semantic clustering, Lumber-style chunking, and RAPTOR-based hierarchical retrieval. All methods are evaluated on a legal question-answering dataset with section-level gold labels, measuring recall, query latency, index build time, and storage requirements. Results show that chunking strategies aligned with the inherent legal structure - particularly section and subsection - based retrieval-achieve the highest recall, while more complex approaches that override this structure perform worse.

相关事件查看全部 (1)

Chunking German Legal Code
2026-06-01PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据