ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison arXiv:2605.20278v2 Announce Type: replace-cross Abstract: Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors occur at the level of individual visual claims. A good dense caption should be both faithful and informative, avoiding hallucination without omitting salient details. Yet pairwise preferences, reference-based metr