Efficient LLM Moderation with Multi-Layer Latent Prototypes 文章

ArXiv CS.CL2026-06-02NEWSen作者: Maciej Chrab\k{a}szcz, Filip Szatkowski, Bartosz W\'ojcik, Jan Dubi\'nski, Tomasz Trzci\'nski, Sebastian Cygert

查看原文 →

关系图谱

摘要

arXiv:2502.16174v4 Announce Type: replace-cross Abstract: Although modern LLMs are aligned with human values during post-training, robust moderation remains essential to prevent harmful outputs at deployment time. Existing approaches suffer from performance-efficiency trade-offs and are difficult to customize to user-specific requirements. Motivated by this gap, we introduce Multi-Layer Prototype Moderator (MLPM), a lightweight and highly customizable input moderation tool. We propose leveraging prototypes of intermediate representations across multiple layers to improve moderation quality while maintaining high efficiency. By design, our method adds negligible overhead to the generation pipeline and can be seamlessly applied to any model. MLPM achieves state-of-the-art performance on diverse moderation benchmarks and demonstrates strong scalability across model families of various sizes.

Efficient LLM Moderation with Multi-Layer Latent Prototypes 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (3)

相关技术