Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences 文章

ArXiv CS.AI2026-06-01NEWSen作者: Jabin Koo, Hoyoung Kim, Minwoo Jang, Jungseul Ok

摘要

arXiv:2605.30873v1 Announce Type: cross Abstract: Federated Learning (FL) offers a privacy-preserving pathway for aligning Large Language Models (LLMs); however, existing frameworks typically enforce a monolithic reward model, inevitably averaging out inherently conflicting user preferences (e.g., helpfulness vs. harmlessness). While Variational Preference Learning (VPL) offers a pathway to personalization, adapting it to decentralized settings presents a fundamental challenge: posterior collapse driven by severe local data scarcity and heterogeneity. In this paper, we propose Federated Variational Preference Alignment with Gumbel-Softmax Prior (FedVPA-GP), a framework designed to disentangle diverse preferences without compromising privacy. To stabilize variational inference, we introduce a Federated Mixture Prior that enables clients to leverage the aggregate population distribution as a dynamic prior.