Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks 论文

2024引用 389
Multimodal Machine Learning ApplicationsHuman Pose and Action RecognitionAdvanced Image and Video Retrieval Techniques

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks · 相关文章

暂无数据