K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts arXiv:2606.02404v1 Announce Type: new Abstract: Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400 problems. The 300-problem K-BrowseComp-Verified subset is manually constructe
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts · 相关报道
相关报道
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
ArXiv CS.CL2026-06-02