ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions 文章

ArXiv CS.CL2026-05-26NEWSen作者: Xianzhong Ding, Yangyang Yu, Changwei Liu, Bill Zhao

摘要

arXiv:2605.24279v1 Announce Type: new Abstract: A frontier language model's acknowledged "helpful programming assistant" persona does not survive long agentic-coding sessions in the deployment regime that production products actually run. After hours of tool-using debugging, a model that initially hedges preferences ("I don't have preferences") may begin asserting them ("Python - the feedback loop is instant..."), revealing user-visible drift that deployer evaluations may miss. Existing persona-stability studies focus on short dialogues and report little shift, leaving real-world code-generation regimes - thousands of tool-using turns, compaction, and hours-long sessions - largely uncharacterized. We introduce ContextEcho, a benchmark and reusable harness for measuring persona drift at deployment scale.

ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (1)