TrisZaska's Machine Learning Blog

The role of Learning rate in Gradient Descent

1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
4. Multi-layer neural network
5. Install and using Multi-layer Neural Network to classify MNIST data
6. Summary
7. References

Learning rate in Gradient Descent

As we mentioned before in the topic of Perceptron rule, do you remember? The learning rate ηη, we also discussed the general purpose of this term, right? So, answer the previous question how the learning rate is important? Firstly, look at the images with three cases ηη too small, ηη be "nice" and ηη become too large,
As we can see,
  • If we choose ηη too small, it's stepping will maybe trap in local minima, computational expensive because it take a lot of time with so many epochs to converge
  • If we choose ηη to be "nice", that's cool, our model work well
  • If we choose ηη too large, it's stepping will be overshooting over the global minima, our model will update weight forever until reach maxima epoch because the error never go down
So, there is no good answer for the question how to choose the learning rate ηη properly, we just trial and fix to achieve the best model as we can. Some others technique allow us to choose ηη automatically and the technique so-called Annealling is also discussed in this paper on the topic Multi-layer Neural Network.

No comments :

Post a Comment

Leave a Comment...