Grounded but Misleading: Evaluating Semantic Alignment in AI-Generated Security Explanations 文章

ArXiv CS.CL2026-06-05NEWSen作者: Heajun An, Connor Ng, Sandesh Sharma Dulal, Junghwan Kim, Jin-Hee Cho

摘要

arXiv:2602.05056v2 Announce Type: replace-cross Abstract: Online scams increasingly leverage fluent and context-aware social engineering strategies, creating growing demand for AI systems that explain why a message may be risky. However, explanations that cite detector-derived evidence may still semantically weaken or redirect the intended risk interpretation. We introduce VEXA: Verifying Semantic Explanation Alignment, a controlled testbed for studying the gap between lexical grounding and semantic risk alignment in AI-generated scam-risk explanations. VEXA generates ungrounded, risk-aligned, and risk-diluting explanations by independently controlling evidence grounding and semantic framing. Through LLM-as-a-judge and human evaluations, we show that explanations may continue to appear comparatively grounded even when their semantic interpretation weakens the detector's intended risk assessment.

相关公司

暂无数据

相关人物

暂无数据