Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories 事件

Name: Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories
Start: 2026-06-04

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories arXiv:2606.04778v1 Announce Type: cross Abstract: Safety-aligned Large Language Models (LLMs) remain vulnerable to interventions during inference that redirect generation toward harmful outputs. Recent work attributes this to shallow safety, where alignment concentrates in the first few output tokens. We show that shallow safety is a special case of a broader inference-time vulnerability, in which short

人工智能

关系图谱

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories 事件

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories · 相关报道

相关报道