PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection 文章

ArXiv CS.AI2026-05-26NEWSen作者: Steffen J. Camarato, Yahya Hmaiti, Mandana Ghadamian, David Mohaisen

摘要

arXiv:2605.24171v1 Announce Type: cross Abstract: Large language models are increasingly used for vulnerability detection, yet their reliability under different prompt formulations remains uncharacterized. We present PromptAudit, a controlled evaluation framework that isolates prompt effects by fixing the dataset, decoding, and parsing while varying only the prompting strategy. Using five prompting strategies across five open-weight models on 1,000 CVEs (6,074 code samples spanning 16 programming languages), we evaluate accuracy, recall, abstention, coverage, and effective F1. We find that standard chain-of-thought prompting achieves the strongest overall operational performance, while few-shot prompting provides model-dependent benefits that are most pronounced for prompt-sensitive models. In contrast, adaptive chain-of-thought frequently suppresses recall and self-consistency induces excessive abstention, sharply reducing effective performance.

PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (12)

相关技术查看全部 (30)