Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison 文章

ArXiv CS.AI2026-05-29NEWSen作者: Tiancheng Yang, Matthias Schonlau, Ilia Sucholutsky

摘要

arXiv:2605.30087v1 Announce Type: new Abstract: Emerging personal AI agents are moving toward persistent, multi-source memory. This creates an evaluation problem: systems must decide how to use conflicting or incomplete evidence; they cannot just retrieve facts from one clean history. Existing benchmarks rarely show whether an error came from the evidence given to a method or from the method's conflict-resolution step. We study this as selective QA over conflicting multi-source personal memory: systems answer based on conflicting, sometimes incomplete sources, or abstain when evidence is insufficient. We develop a benchmark containing 18 question templates across 8 reasoning types, 480 personas, 4 random seeds, and 34,560 instances, with controlled source distortions and deterministic ground truth. We evaluate the performance of baselines without access to any source, access to a single source, structured fusion methods, and frontier LLMs. The best trained fusion resolver reaches 80.

Selective QA over Conflicting Multi-Source Personal Memory: A Diagnostic Testbed and Method Comparison 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术