摘要
arXiv:2602.05056v2 Announce Type: replace-cross Abstract: Online scams increasingly leverage fluent and context-aware social engineering strategies, creating growing demand for AI systems that explain why a message may be risky. However, explanations that cite detector-derived evidence may still semantically weaken or redirect the intended risk interpretation. We introduce VEXA: Verifying Semantic Explanation Alignment, a controlled testbed for studying the gap between lexical grounding and semantic risk alignment in AI-generated scam-risk explanations. VEXA generates ungrounded, risk-aligned, and risk-diluting explanations by independently controlling evidence grounding and semantic framing. Through LLM-as-a-judge and human evaluations, we show that explanations may continue to appear comparatively grounded even when their semantic interpretation weakens the detector's intended risk assessment.
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据