MemFail: Stress-Testing Failure Modes of LLM Memory Systems 文章

ArXiv CS.AI2026-05-27NEWSen作者: Ishir Garg, Neel Kolhe, Dawn Song, Xuandong Zhao

摘要

arXiv:2605.26667v1 Announce Type: new Abstract: Large language model (LLM) agents increasingly rely on external memory systems to remain consistent across long-horizon interactions, but little empirical work has been done to understand the specific failure modes and design choices that these systems present. Existing benchmarks report aggregate question-answering accuracy and treat memory systems as black boxes, making it impossible to attribute an incorrect answer to a particular failure mode of the system. We introduce MemFail, a diagnostic benchmark that isolates the failure modes of modern LLM memory systems. We begin by formalizing memory systems as the composition of three canonical operations -- summarization, storage, and retrieval -- and identify the potential failure modes induced by each. Based on these hypothesized failure modes, we construct five datasets spanning four tasks, each adversarially designed to test a specific operation of a memory system.

MemFail: Stress-Testing Failure Modes of LLM Memory Systems 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (7)

相关技术查看全部 (25)