Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits arXiv:2605.30913v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in conversational settings where user tone ranges from polite to adversarial or toxic, yet less is known about whether toxic language in otherwise semantically equivalent prompts can degrade factual reliability. We study how lexical and tone-based prompt perturbations affect the factual reliability of LLMs. Using controlled pro