For Valid Generalization the Size of the Weights is More Important than the Size of the Network 论文

1996引用 221

Machine Learning and AlgorithmsMachine Learning and Data ClassificationNeural Networks and Applications

人工智能 Neural Networks and Applications Machine Learning and Data Classification Machine Learning and Algorithms

作者

摘要

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. More specifically, consider an `-layer feed-forward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A. The misclassification probability converges to an error estimate (that is closely related to squared error on the training set) at rate O((cA) `(`+1)=2 p (log n)=m) ignoring log factors, where m is the number of training patterns, n is the input dimension, and c is a constant. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early st...

作者查看全部 (1)

Peter L. Bartlett

For Valid Generalization the Size of the Weights is More Important than the Size of the Network 论文

摘要

作者查看全部 (1)

相关技术查看全部 (3)

相关事件

相关文章