Boundary Suppression Asymmetry in Post-trained Assistants: Over-expansion as a Controllability Cost 文章

ArXiv CS.CL2026-05-28NEWSen作者: Jiarui Han

摘要

arXiv:2605.27969v1 Announce Type: new Abstract: Post-trained language-model assistants are often optimized to avoid under-answering, encouraging complete, helpful, cautious, and proactive responses. We ask whether this optimization creates asymmetric controllability costs: when users explicitly request narrower answers, which assistant behaviors remain suppressible, and which continue to shape the response? We study this problem as boundary-suppression asymmetry. Prompt-side probes across multiple high-level response dimensions suggest a selective cost, concentrated around `too-much assistant' directions such as over-completion, extra help, and anti-underanswering. Using controlled assistant-policy variants derived from a shared base model, we find that anti-underanswering policies are harder to pull back than the baseline under matched boundary-control evaluations, while minimal-boundary variants generally avoid this anti-side upward shift in the direct boundary-control…

摘要可能不完整,可查看原文

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据