VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild 事件

Name: VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild
Start: 2026-05-28

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild arXiv:2605.27882v1 Announce Type: new Abstract: LLM-based agents score well on search benchmarks, yet real users consistently find results unsatisfying, revealing a persistent evaluation-experience gap. We attribute this gap to existing benchmarks' reliance on over-specified queries, single-turn interactions, and fixed-schema evaluation, none of which reflect real search behavior where users and agents collaboratively refin

人工智能

关系图谱

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild 事件

相关公司查看全部 (10)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)