Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression 文章

ArXiv CS.CV2026-06-05NEWSen作者: Lingao Xiao, Songhua Liu, Yang He, Xinchao Wang

摘要

arXiv:2502.06434v2 Announce Type: replace Abstract: Dataset pruning (DP) and dataset distillation (DD) fundamentally differ in their outputs: DP selects original image subsets, while DD generates synthetic images. Recently, DD's increasing reliance on original images suggests a convergence of the two directions. To investigate this convergence trend, we propose a unified dataset compression (DC) benchmark. This benchmark reveals an interesting trade-off for soft-label-DD: while soft labels provide valuable information, they can make the distillation process less essential, as distilled images may not always outperform random subsets. In addition, the benchmark reveals that in current stages, dataset pruning outperforms dataset distillation at small dataset sizes. Given these observations, we explore hard-label-DC as a complementary approach that emphasizes image quality while offering substantial storage efficiency.

Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (5)