Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval 论文

2021International Conference on Learning Representations引用 335
Domain Adaptation and Few-Shot LearningTopic ModelingMultimodal Machine Learning Applications

摘要

Conducting text retrieval in a dense representation space has many intriguing advantages. Yet the end-to-end learned dense retrieval (DR) often underperforms word-based sparse retrieval. In this paper, we first theoretically show the learning bottleneck of dense retrieval is due to the domination of uninformative negatives sampled locally in batch, which yield diminishing gradient norms, large stochastic gradient variances, and slow learning convergence. We then propose Approximate nearest neighbor Negative Contrastive Learning (ANCE), a learning mechanism that selects hard training negatives globally from the entire corpus, using an asynchronously updated ANN index. Our experiments demonstrate the effectiveness of ANCE on web search, question answering, and in a commercial search environment, showing ANCE dot-product retrieval nearly matches the accuracy of BERT-based cascade IR pipeline, while being 100x more efficient. We also empirically validate our theory that negative sampling with ANCE better approximates the oracle gradient-norm based importance sampling, thus improves the convergence of stochastic training.