Deduplicating Training Data Makes Language Models Better 论文
2022Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)引用 251
Topic ModelingNatural Language Processing TechniquesData Quality and Management
摘要
Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, Nicholas Carlini. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.