Video-LLaVA: Learning United Visual Representation by Alignment Before Projection 论文

2024引用 231
Human Pose and Action RecognitionAdvanced Vision and ImagingAdvanced Image and Video Retrieval Techniques

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection · 作者