摘要
arXiv:2512.10234v2 Announce Type: replace-cross Abstract: Even LLMs that appear safe during evaluation can still produce harmful responses in deployment. Because stochastic sampling yields different responses to the same prompt, low-probability harmful outputs can still reach users at scale. Common human evaluation workflows generate many random samples per prompt and review them in static spreadsheets. The practice scales poorly, forcing evaluators to repeatedly reread near-duplicate prefixes. To address this, we present InFerActive, an interactive system that visualizes sampling results as a navigable tree of readable phrases, allowing evaluators to filter, explore, and extend the generation space on demand. InFerActive utilizes breadth-first sampling, a novel tree construction procedure that matches the harmful-response coverage of random sampling while requiring up to 5.0x fewer samples.
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据