Latent Performance Profiling of Large Language Models 事件

Name: Latent Performance Profiling of Large Language Models
Start: 2026-05-29

OPEN_SOURCE2026-05-29影响: MEDIUM

Latent Performance Profiling of Large Language Models arXiv:2605.30018v1 Announce Type: new Abstract: Large language models (LLMs) frequently achieve impressive scores on standardized benchmarks, yet accuracy alone offers a limited view of their capabilities. Evaluating open-source LLMs through leaderboards faces persistent issues like data contamination, narrow task scope, and weak alignment with real-world reliability. Benchmark-based evaluations such as MMLU PRO, BBH, or IFEval primarily cap

人工智能

关系图谱

Latent Performance Profiling of Large Language Models 事件

相关公司查看全部 (10)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)