See, Infer, Intervene: Proactive World Modeling for Goal-Oriented Social Intelligence 文章

ArXiv CS.CL2026-06-03NEWSen作者: Honghui Zhang, Chenmeinian Guo, Yichen Yu, Guanyu Liu, Yongming Qin, Chongguo Song, Mengyue Yang, Lei Yu, Tianyu Shi

摘要

arXiv:2606.03371v1 Announce Type: new Abstract: Multimodal retail agents should not only recognize what a customer is doing, but also decide whether and how to assist before an explicit request is made. We study this setting through the See--Infer--Intervene (SII) framework, where a device must see pre-interaction behavior, infer latent customer intent, and act by selecting an appropriate service intervention or choosing to wait. We instantiate SII with the Proactive Intent World Model (PIWM), which represents customer state with AIDA (Attention, Interest, Desire, Action) purchasing phases and BDI (belief, desire, intention) psychological fields, predicts action-conditioned intent transitions, and selects from five response classes: Greet, Elicit, Inform, Recommend, and Hold. We further construct GuidanceSalesBench, a smart-retail benchmark containing state manifests, pre-interaction videos, candidate responses, action-conditioned outcomes, and best-action labels.