ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions 文章

ArXiv CS.CL2026-05-26NEWSen作者: Xianzhong Ding, Yangyang Yu, Changwei Liu, Bill Zhao

摘要

arXiv:2605.24279v1 Announce Type: new Abstract: A frontier language model's acknowledged "helpful programming assistant" persona does not survive long agentic-coding sessions in the deployment regime that production products actually run. After hours of tool-using debugging, a model that initially hedges preferences ("I don't have preferences") may begin asserting them ("Python - the feedback loop is instant..."), revealing user-visible drift that deployer evaluations may miss. Existing persona-stability studies focus on short dialogues and report little shift, leaving real-world code-generation regimes - thousands of tool-using turns, compaction, and hours-long sessions - largely uncharacterized. We introduce ContextEcho, a benchmark and reusable harness for measuring persona drift at deployment scale.

相关公司

暂无数据

相关人物

暂无数据