Where Does Authorship Signal Emerge in Encoder-Based Language Models? 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Where Does Authorship Signal Emerge in Encoder-Based Language Models? arXiv:2605.19908v2 Announce Type: replace Abstract: Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and function-word frequency are similarly available at every layer in every model we