CPPO: Contrastive Perception Policy Optimization for VLM Agents 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

CPPO: Contrastive Perception Policy Optimization for VLM Agents arXiv:2601.00501v2 Announce Type: replace Abstract: We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision--language models (VLMs). Reliable perception is a core requirement for VLM-based agents that must reason and act in open-ended environments: faulty visual grounding cascades directly into faulty actions, hallucinated tool calls, and unsafe decisions. While reinforcement learning (RL) has s