Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models 论文

2009Neural Information Processing Systems引用 243
Neural Networks and ApplicationsGenerative Adversarial Networks and Image SynthesisStochastic Gradient Optimization Techniques

摘要

Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.