Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics arXiv:2605.26840v1 Announce Type: new Abstract: Reinforcement learning with evaluation metrics as rewards is widely used to enhance specific capabilities of language models. However, for tasks such as factually consistent summarisation, existing metrics remain underdeveloped, limiting their effectiveness as signals for shaping model behaviour.While individual factuality metrics are unreliable