Diagnosing the Reliability of LLM-as-a-Judge via Item Response Theory 文章

ArXiv CS.AI2026-06-01NEWSen作者: Junhyuk Choi, Sohhyung Park, Chanhee Cho, Hyeonchu Park, Bugeun Kim

Diagnosing the Reliability of LLM-as-a-Judge via Item Response Theory · 相关技术