FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding arXiv:2605.19846v3 Announce Type: replace Abstract: Vision-Language Models (VLMs) have demonstrated remarkable capabilities in general video understanding, yet they often struggle with the fine-grained comprehension crucial for real-world applications requiring nuanced interpretation of human actions and interactions. While some recent human-centric benchmarks evaluate aspects of model beh