Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders arXiv:2602.10388v3 Announce Type: replace Abstract: The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream pe