Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks 论文
2024引用 389
Multimodal Machine Learning ApplicationsHuman Pose and Action RecognitionAdvanced Image and Video Retrieval Techniques
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks · 相关文章
暂无数据