PageLLM: A Multi-Grained Reward Framework for Whole-Page Optimization with Large Language Models 文章

ArXiv CS.AI2026-05-26NEWSen作者: Xinyuan Wang, Liang Wu, Dongjie Wang, Yanjie Fu

摘要

arXiv:2506.09084v2 Announce Type: replace-cross Abstract: Whole-page optimization (WPO) decides how search and recommendation results are surfaced to users, and large language models (LLMs) open a new route to it by treating page generation as sequence generation. Adapting LLMs to web-scale WPO, however, remains bottlenecked by the need for costly human annotations and by the mismatched granularity between page-level coherence and item-level placement. In this work we show that these two challenges are coupled: implicit user feedback alone suffices for alignment, provided the reward signal is decoupled into two complementary granularities. We propose PageLLM, a reward-based fine-tuning framework that (i) turns implicit feedback into four contrastive preference-pair families covering relevance, ranking, diversity, and redundancy, (ii) learns a coarse page-level reward and a fine item-level reward that captures engagement-sensitive position swaps, and (iii) combines both rewards in…

摘要可能不完整,可查看原文