Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare arXiv:2509.07961v2 Announce Type: replace Abstract: We develop new experimental paradigms for measuring welfare in language models. We compare verbal reports of models about their preferences with preferences expressed through behavior when navigating a virtual environment and selecting conversation topics. We also test how costs and rewards affect behavior and whether responses to an eudaimonic w

Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare · 相关人物