TrisZaska's Machine Learning Blog

The role of Learning rate in Gradient Descent

1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
4. Multi-layer neural network
5. Install and using Multi-layer Neural Network to classify MNIST data
6. Summary
7. References

Learning rate in Gradient Descent

As we mentioned before in the topic of Perceptron rule, do you remember? The learning rate \(\eta\), we also discussed the general purpose of this term, right? So, answer the previous question how the learning rate is important? Firstly, look at the images with three cases \(\eta\) too small, \(\eta\) be "nice" and \(\eta\) become too large,
As we can see,
  • If we choose \(\eta\) too small, it's stepping will maybe trap in local minima, computational expensive because it take a lot of time with so many epochs to converge
  • If we choose \(\eta\) to be "nice", that's cool, our model work well
  • If we choose \(\eta\) too large, it's stepping will be overshooting over the global minima, our model will update weight forever until reach maxima epoch because the error never go down
So, there is no good answer for the question how to choose the learning rate \(\eta\) properly, we just trial and fix to achieve the best model as we can. Some others technique allow us to choose \(\eta\) automatically and the technique so-called Annealling is also discussed in this paper on the topic Multi-layer Neural Network.

No comments :

Post a Comment

Leave a Comment...