SRA: Span Representation Alignment for Large Language Model Distillation 事件

PRODUCT_LAUNCH2026-06-03影响: MEDIUM

SRA: Span Representation Alignment for Large Language Model Distillation arXiv:2605.01205v2 Announce Type: replace Abstract: Cross-Tokenizer Knowledge Distillation (CTKD) enables knowledge transfer between a large language model and a smaller student, even when they employ different tokenizers. While existing approaches mainly focus on token-level alignment strategies, which are often brittle and sensitive to discrepancies between tokenizers, we argue that the method of aggregating tokens into