VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild arXiv:2605.27882v1 Announce Type: new Abstract: LLM-based agents score well on search benchmarks, yet real users consistently find results unsatisfying, revealing a persistent evaluation-experience gap. We attribute this gap to existing benchmarks' reliance on over-specified queries, single-turn interactions, and fixed-schema evaluation, none of which reflect real search behavior where users and agents collaboratively refin
相关公司查看全部 (10)
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild
ArXiv CS.CL2026-05-28