Deduplicating Training Data Makes Language Models Better 论文

2022Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)引用 251
Topic ModelingNatural Language Processing TechniquesData Quality and Management

摘要

Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.