SLMJury: Can Small Language Models Judge as Well as Large Ones? 事件

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

SLMJury: Can Small Language Models Judge as Well as Large Ones? arXiv:2606.07810v1 Announce Type: cross Abstract: Large language models (LLMs) are widely used as judges for evaluating model outputs, but their high cost, latency, and opacity limit scalability. We introduce SLMJury, a framework for evaluating small language models (SLMs) as judges across two paradigms: closed-ended binary correctness and open-ended quality scoring. We benchmark 16 SLM judges (0.6B-14B parameters) from four model