A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents 文章

ArXiv CS.CL2026-05-28NEWSen作者: Bao Gia Doan, Aditya Joshi, Pantelis Elinas, Aarya Bodhankar, Oscar Leslie, Tom Marchant, Flora Salim

查看原文 →

关系图谱

摘要

arXiv:2604.17943v2 Announce Type: replace Abstract: RAG-based question-answering (QA) in specialist domains faces a cold-start problem: lack of evaluative benchmarks and absence of labeled data for post-training. We present DoRA (Domain-oriented RAG Assessment), a novel benchmark construction and evaluation framework using only a small set of specialist domain documents. DoRA systematically generates synthetic QA training and evaluation datasets with auditable evidence across five domain-specific intents. To mitigate same-pipeline circularity, DoRA's training and test splits use different LLM families (Claude Sonnet for training; GPT-4o for test) drawn from disjoint seed-document corpora. Instantiated on 40 defense-related documents (written in English), DoRA yields ~6.6K curated instances. Compared against 8 LLM baselines over a benchmark of 1,259 samples, a LoRA-adapted Llama3.

A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (12)

相关技术查看全部 (3)