Interactively optimizing information retrieval systems as a dueling bandits problem 论文
2009引用 281
Advanced Bandit Algorithms ResearchMachine Learning and AlgorithmsOptimization and Search Problems
摘要
We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 2007; Radlinski et al., 2008b). We will present an algorithm with theoretical guarantees as well as simulation results.