Cultural Value Alignment Via Latent Activation Steering in Large Language Models 文章

ArXiv CS.CL2026-05-27NEWSen作者: Trung Duc Anh Dang, Sarah Masud

详细信息

来源站点: ArXiv CS.CL
作者: Trung Duc Anh Dang, Sarah Masud
文章类型: NEWS
语言: en
发布日期: 2026-05-27

摘要

arXiv:2605.26365v1 Announce Type: new Abstract: Large Language Models (LLMs) often exhibit homogenized cultural perspectives. While the World Values Survey (WVS) provides a gold standard for mapping human values, traditional direct prompting of LLMs on WVS often fails to access the model's latent cultural depth, leading to safety-aligned refusals or neutral responses. Here, we propose a generalizable framework for cultural evaluation and intervention that transitions from abstract queries to scenario-based behavioral probing. By extracting implicit token probabilities across 300 situational dilemmas, we bypass surface-level alignment to map the latent coordinates of LLMs cultural value. We further introduce activation steering to shift these internal alignments during the forward pass without retraining.

Cultural Value Alignment Via Latent Activation Steering in Large Language Models 文章

详细信息

摘要

相关事件

相关公司查看全部 (5)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (24)