SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs arXiv:2603.20253v2 Announce Type: replace-cross Abstract: Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in phy