Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics arXiv:2605.26840v1 Announce Type: new Abstract: Reinforcement learning with evaluation metrics as rewards is widely used to enhance specific capabilities of language models. However, for tasks such as factually consistent summarisation, existing metrics remain underdeveloped, limiting their effectiveness as signals for shaping model behaviour.While individual factuality metrics are unreliable