Why tanh: choosing a sigmoidal function 论文

2003引用 216
Neural Networks and ApplicationsModel Reduction and Neural NetworksMachine Learning and ELM

摘要

As hardware implementations of backpropagation and related training algorithms are anticipated, the choice of a sigmoidal function should be carefully justified. Attention should focus on choosing an activation function in a neural unit that exhibits the best properties for training. The author argues for the use of the hyperbolic tangent. While the exact shape of the sigmoidal makes little difference once the network is trained, it is shown that it possesses particular properties that make it appealing for use while training. By paying attention to scaling it is illustrated that tanh (1.5*) has the additional advantage of equalizing training over layers. This result can easily generalize to several standard sigmoidal functions commonly in use.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>