FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics 文章
摘要
arXiv:2605.17373v2 Announce Type: replace-cross Abstract: AI research agents accelerate ML research by automating hypothesis generation, experimentation, and empirical refinement. Existing agent strategies range from greedy hill-climbing to tree search and evolutionary optimization, yet which strategy choices drive performance remains unclear. Answering this question requires a benchmark that separates agent strategy (e.g., search topology) from execution infrastructure (e.g., code editor), so that performance differences are attributable to strategy rather than infrastructure, and that provides process-level metrics beyond final scores to analyze exploration behaviors. Existing benchmarks offer limited support. We propose FML-Bench, a benchmark of 18 fundamental ML research tasks across 10 domains that separates agent strategy from execution infrastructure and defines 12 process-level behavioral metrics.
相关事件查看全部 (1)
相关公司
暂无数据
相关人 物
暂无数据
相关技术
暂无数据