Alignment Makes Language Models Normative, Not Descriptive 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Alignment Makes Language Models Normative, Not Descriptive arXiv:2603.17218v2 Announce Type: replace Abstract: Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, negotiation, and repeated matrix games. In these settings, base models outperform their ali