Emergent alignment and the projectability of ethical personas 事件
PRODUCT_LAUNCH2026-06-09影响: MEDIUM
Emergent alignment and the projectability of ethical personas arXiv:2606.09475v1 Announce Type: new Abstract: Work on `emergent misalignment' shows that finetuning LLMs on narrow tasks can induce broadly misaligned behavior. This supports the `persona selection' (PSM) hypothesis: during pre-training, LLMs learn to simulate different characters and perspectives, which can be elicited and refined during post-training. This paper investigates the converse phenomenon, `emergent alignment', and uses
相关产品查看全部 (10)
相关报道查看全部 (1)
Emergent alignment and the projectability of ethical personas
ArXiv CS.AI2026-06-09