K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts 事件

Name: K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts arXiv:2606.02404v1 Announce Type: new Abstract: Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400 problems. The 300-problem K-BrowseComp-Verified subset is manually constructe

人工智能

关系图谱

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts 事件

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts · 相关报道

相关报道