Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation 文章

ArXiv CS.AI2026-05-28NEWSen作者: Jianwei Li, Jung-Eun Kim

摘要

arXiv:2605.28597v1 Announce Type: cross Abstract: This position paper argues that the AI/ML community should stop overclaiming and retire the label "positive backdoor," and instead treat trigger-activated hidden behaviors as Secret Alignment. Crucially, protective claims based on Secret Alignment should be presumed not secure by default unless supported by rigorous, standardized evaluation. The Private AI era, enabled by open-weight LLMs and accessible training/inference stacks, turns language models into privately owned digital assets, creating security concerns around unauthorized access, model theft, and behavioral misuse. Recently, a line of work framed as "positive backdoors" has been proposed to address these challenges.

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)