Latent Performance Profiling of Large Language Models 事件

OPEN_SOURCE2026-05-29影响: MEDIUM

Latent Performance Profiling of Large Language Models arXiv:2605.30018v1 Announce Type: new Abstract: Large language models (LLMs) frequently achieve impressive scores on standardized benchmarks, yet accuracy alone offers a limited view of their capabilities. Evaluating open-source LLMs through leaderboards faces persistent issues like data contamination, narrow task scope, and weak alignment with real-world reliability. Benchmark-based evaluations such as MMLU PRO, BBH, or IFEval primarily cap

Latent Performance Profiling of Large Language Models · 相关报道