CLIPScore: A Reference-free Evaluation Metric for Image Captioning 论文

2021Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing引用 937
Multimodal Machine Learning ApplicationsTopic ModelingNatural Language Processing Techniques

摘要

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in contrast to the reference-free manner in which humans assess caption quality.