From Simulation to Enaction: Post-trained language models recognize and react to their own generations 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

From Simulation to Enaction: Post-trained language models recognize and react to their own generations arXiv:2605.25459v1 Announce Type: cross Abstract: Language models are pretrained as passive predictors with no incentive to model the consequences of their own outputs. Post-training changes this: a model producing its own responses can benefit from recognizing that it is on-policy. We present evidence that post-trained models recognize their on-policy generations, and this recognition is impl