Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning 文章

ArXiv CS.CL2026-06-02NEWSen作者: Xiaoyang Ming, Jose Hernandez, Thomas Stephan Juzek

摘要

arXiv:2606.00334v1 Announce Type: new Abstract: Various language domains have undergone remarkable changes in recent years; these shifts are largely attributed to the advent of Large Language Models and their misalignment with natural language usage. These misalignments are thought to partly originate in the preference-learning stage, e.g. Reinforcement Learning from Human Feedback, which generally makes models more useful but simultaneously may introduce systematic lexical bias. In terms of lexical behavior, this is visible in a model's preference for certain formats or the overuse of words (delve, furthermore), even when such patterns are not present in base model outputs. Research on lexical misalignment induced during preference training is constrained by reliance on manual curation.

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)