Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense arXiv:2606.05743v1 Announce Type: cross Abstract: Despite advances in safety alignment, large language models remain vulnerable to continuously evolving jailbreaks. Existing fine-tuned safety classifiers cannot adapt to these evolving attacks, while adaptive memory-based guardrails tend to over-refuse benign queries that resemble stored attacks. We propose Membrane, a self-evolving guardrail built on Contrastive Safety Me

Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense · 相关人物