Soro: A Lightweight Foundation Model and Chatbot for Tajik 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Soro: A Lightweight Foundation Model and Chatbot for Tajik arXiv:2605.27379v1 Announce Type: cross Abstract: We present Soro, a family of Tajik-specialized conversational large language models (LLMs) designed for real-world deployment under tight compute and connectivity constraints in Tajikistan. Starting from open-weight Gemma 3 checkpoints, we perform Tajik-only continual pretraining on a curated 1.9-billion-token corpus spanning filtered web text, PDF documents, and curriculum-aligned educa