From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale 文章

ArXiv CS.CL2026-05-29NEWSen作者: Rohan Mahapatra

摘要

arXiv:2605.28826v1 Announce Type: new Abstract: In modern LLMs, linguistic features function not as stylistic artifacts but as probes of probability mass, allocated under training alignment objectives. Language models trained with contemporary pipelines exhibit severe reshaping of linguistic features, leading to extreme language re-distribution. While previous stylometric analyses explored linguistic differences between AI-generated and human texts, we focus on the reshaping plaguing the LLM training pipeline itself. We analyze 17 models (410M-100B+ parameters) across 24 linguistically-motivated probes, documenting that instruction-tuned systems systematically collapse language entropy along discourse and structural dimensions (mean amplification: 1,949-16,853%, peaks: 5,181-209,675%), while selectively suppressing complex punctuation to 3.2-23.2% of baseline frequencies. These effects do not worsen under RLHF, as divergence patterns are statistically indistinguishable (p > 0.

From Context Shift to Stylistic Collapse: Why Training Objectives Matter More Than Scale 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)