K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts arXiv:2606.02404v1 Announce Type: new Abstract: Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400 problems. The 300-problem K-BrowseComp-Verified subset is manually constructe
相关产品查看全部 (10)
相关报道查看全部 (1)
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
ArXiv CS.CL2026-06-02