Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models 文章

Hugging Face Blog2024-03-20BLOGen