Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Committed SAE-Feature Traces for Audited-Session Substitution Detection in Hosted LLMs arXiv:2604.18179v2 Announce Type: replace-cross Abstract: Hosted-LLM providers have a silent-substitution incentive: advertise a stronger model while serving cheaper replies. Probe-after-return schemes such as SVIP leave a parallel-serve side-channel, since a dishonest provider can route the verifier's probe to the advertised model while serving ordinary users from a substitute. We propose a commit-open proto