Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures 文章

ArXiv CS.AI2026-05-29NEWSen作者: Junyoung Park, Sunghwan Park, Seongyong Ju, Jaewoo Lee

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures · 相关技术