PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges arXiv:2605.30803v1 Announce Type: new Abstract: LLM judges are increasingly used to evaluate open-ended responses, but their scores depend strongly on the rubrics that condition them. A vague rubric asking for a response to be ``helpful and factual'' can reward polished answers that invent facts or violate user intent. We treat reusable rubrics as measurement specifications: changing the rubric changes the response quality mea