SDR: Set-Distance Rewards for Radiology Report Generation 文章

ArXiv CS.AI2026-06-02NEWSen作者: Halil Ibrahim Gulluk, Max Van Puyvelde, Wim Van Criekinge, Olivier Gevaert

摘要

arXiv:2606.00440v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. However, for chest X-ray report generation, the standard rewards (i.e. exact-match accuracy and step-level processes) are incompatible because the reports consist of unordered and orthogonal findings, rather than a causal reasoning chain. We address this gap with a set-based view: each report is split into sentences and embedded by a frozen sentence transformer, yielding unordered embedding sets. We propose the use of set-to-set distances between generated and reference embeddings as continuous, permutation-invariant rewards. Across two datasets and three vision--language models (Qwen3-VL-2B/4B, Gemma3-4B), post-training with set-to-set distance based rewards via GRPO consistently outperforms supervised fine-tuning and exact-match GRPO on all headline metrics (BERTScore, RadGraph F1 and CheXbert F1 by average \%6.80, \%7.82 and \%4.

相关事件查看全部 (1)

SDR: Set-Distance Rewards for Radiology Report Generation
2026-06-02PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据