Learning rate schedules for faster stochastic gradient search 论文
2003引用 217
Stochastic Gradient Optimization TechniquesNeural Networks and ApplicationsDomain Adaptation and Few-Shot Learning
摘要
The authors propose a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence for stochastic gradient descent. Empirical tests agree with theoretical expectations that drift can be used to determine whether the crucial parameter c is large enough. Using this statistic, it will be possible to produce the first adaptive learning rates which converge at optimal speed.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>