Knowledge Boundary Probing and Demand-Guided Intervention for LLM-Based Power System Code Generation 文章

ArXiv CS.CL2026-06-01NEWSen作者: Hui Wu, Xiaoyang Wang, Zhong Fan

摘要

arXiv:2605.31478v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to automate power-system analysis, but many utilities and energy-research labs require on-premise serving for confidentiality, regulatory, reproducibility, and cost reasons. This makes the reliability of open-weight models a deployment issue. We show that first-pass failures in power-system code generation are dominated not by reasoning alone, but by structured API-knowledge boundary errors: hallucinated function names, misused parameters, and mishandled result tables in versioned simulation libraries. We introduce PowerCodeBench, an execution-validated benchmark generator that pairs natural-language operator queries with pandapower code and numerical ground truth; an L0-L3 documentation-driven probing procedure that measures per-model API knowledge profiles;