摘要
arXiv:2605.27969v1 Announce Type: new Abstract: Post-trained language-model assistants are often optimized to avoid under-answering, encouraging complete, helpful, cautious, and proactive responses. We ask whether this optimization creates asymmetric controllability costs: when users explicitly request narrower answers, which assistant behaviors remain suppressible, and which continue to shape the response? We study this problem as boundary-suppression asymmetry. Prompt-side probes across multiple high-level response dimensions suggest a selective cost, concentrated around `too-much assistant' directions such as over-completion, extra help, and anti-underanswering. Using controlled assistant-policy variants derived from a shared base model, we find that anti-underanswering policies are harder to pull back than the baseline under matched boundary-control evaluations, while minimal-boundary variants generally avoid this anti-side upward shift in the direct boundary-control…
摘要可能不完整,可查看原文
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据
相关产品
暂无数据
相关技术
暂无数据