A POMDP formulation of preference elicitation problems 论文
摘要
Preference elicitation is a key problem facing the deployment of intelligent systems that make or rec-ommend decisions on the behalf of users. Since not all aspects of a utility function have the same im-pact on object-level decision quality, determining which information to extract from a user is itself a sequential decision problem, balancing the amount of elicitation effort and time with decision quality. We formulate this problem as a partially-observable Markov decision process (POMDP). Because of the continuous nature of the state and action spaces of this POMDP, standard techniques cannot be used to solve it. We describe methods that exploit the spe-cial structure of preference elicitation to deal with parameterized belief states over the continuous state space, and gradient techniques for optimizing pa-rameterized actions. These methods can be used with a number of different belief state representa-tions, including mixture models. 1