Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA 文章

ArXiv CS.AI2026-05-29NEWSen作者: Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun, Yehua Yang, Qiao Zhao

摘要

arXiv:2605.29277v1 Announce Type: cross Abstract: We present Code-QA-Bench, a fully automated framework for synthesizing repository-level code understanding benchmarks that separates genuine code comprehension from documentation recall and pretraining memorization. The framework makes two methodological contributions: (1) an answer-first generation pipeline where a tool-equipped agent explores source code to produce verified gold answers before deriving questions, ensuring every task is grounded in real code structure; and (2) a three-condition experimental design evaluating agents under closed-book (no repository), code-only (documentation removed), and documented (full repository) conditions, with deltas directly quantifying documentation utility and memorization. We generate 528 code-derivable and 100 doc-dependent tasks across 10 Python repositories from SWE-Bench, scored by an LLM judge on accuracy, completeness, and specificity.

相关公司

暂无数据

相关人物

暂无数据