When Behavioral Safety Evaluation Fails: A Representation-Level Perspective 事件

Name: When Behavioral Safety Evaluation Fails: A Representation-Level Perspective
Start: 2026-06-09

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective arXiv:2606.08044v1 Announce Type: cross Abstract: Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limited evidence of internal robustness, as these evaluations target outputs rather than representation-level vulnerability under intervention. We formalize this discrepancy as the audit gap: the difference between behavioral safety and robustness under intervention. To study

人工智能

关系图谱

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective 事件

相关公司查看全部 (10)

相关人物查看全部 (4)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)