Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion 文章

ArXiv CS.CV2026-06-02NEWSen作者: DongQing Liu, MengShi Qi, HongWei Ji

摘要

arXiv:2606.02450v1 Announce Type: new Abstract: CoVR-R studies reason-aware composed video retrieval: given a reference video and an edit instruction, the system must retrieve the target video that satisfies the edit. The main difficulty is that the target is not described directly; it must be inferred from fine-grained changes in object identity, action order, final state, hand interaction, and scene transition. We build a zero-shot reason-then-retrieve pipeline around Qwen3.5-27B. For each gallery video, the model generates a retrieval-oriented structured description and a dense embedding by pooling generated-token hidden states with token-dependent weights. For each query, the model first performs edit reasoning over the reference video and instruction, then generates a target-video description whose hidden states serve as the query embedding. We complement dense retrieval with a TF-IDF branch over the generated texts and fuse the two rankings with split-specific weights.

Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (3)