Towards Inclusive Toxic Content Moderation: Addressing Vulnerabilities to Adversarial Attacks in Toxicity Classifiers Tackling LLM-generated Content 文章

ArXiv CS.CL2026-05-26NEWSen作者: Shaz Furniturewala, Arkaitz Zubiaga

Towards Inclusive Toxic Content Moderation: Addressing Vulnerabilities to Adversarial Attacks in Toxicity Classifiers Tackling LLM-generated Content · 相关技术