The role of Learning rate in Gradient Descent

Tuesday, June 6, 2017

1. Introduction
2. History and Overview about Artificial Neural Network

3. Single neural network

3.1 Perceptron

3.2 Adaptive Linear Neurons

3.2.1 Gradient Descent rule (Delta rule)
3.2.2 Learning rate in Gradient Descent
3.2.3 Implement Adaline in Python to classify Iris data
3.2.4 Learning via types of Gradient Descent

3.3 Problems with Perceptron (AI Winter)

4. Multi-layer neural network

4.1 Overview about Multi-layer Neural Network
4.2 Forward Propagation
4.3 Cost function
4.4 Backpropagation
4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron

4.6 Some optional techniques for Multi-layer Neural Network Optimization

4.7 Multi-layer Neural Network for binary/multi classification

5. Install and using Multi-layer Neural Network to classify MNIST data

5.1 Overview about MNIST data
5.2 Implement Multi-layer Neural Network
5.3 Debugging Neural Network with Gradient Descent Checking

6. Summary
7. References

Learning rate in Gradient Descent

As we mentioned before in the topic of Perceptron rule, do you remember? The learning rate \(\eta\), we also discussed the general purpose of this term, right? So, answer the previous question how the learning rate is important? Firstly, look at the images with three cases \(\eta\) too small, \(\eta\) be "nice" and \(\eta\) become too large,

As we can see,

If we choose \(\eta\) too small, it's stepping will maybe trap in local minima, computational expensive because it take a lot of time with so many epochs to converge
If we choose \(\eta\) to be "nice", that's cool, our model work well
If we choose \(\eta\) too large, it's stepping will be overshooting over the global minima, our model will update weight forever until reach maxima epoch because the error never go down

So, there is no good answer for the question how to choose the learning rate \(\eta\) properly, we just trial and fix to achieve the best model as we can. Some others technique allow us to choose \(\eta\) automatically and the technique so-called Annealling is also discussed in this paper on the topic Multi-layer Neural Network.

The role of Learning rate in Gradient Descent

Learning rate in Gradient Descent

No comments :

Post a Comment