Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding 论文
2023引用 438
Multimodal Machine Learning ApplicationsDomain Adaptation and Few-Shot LearningAdvanced Image and Video Retrieval Techniques
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding · 相关文章
暂无数据