1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Learning rate in Gradient Descent
As we mentioned before in the topic of Perceptron rule, do you remember? The learning rate \(\eta\), we also discussed the general purpose of this term, right? So, answer the previous question how the learning rate is important? Firstly, look at the images with three cases \(\eta\) too small, \(\eta\) be "nice" and \(\eta\) become too large,As we can see,
- If we choose \(\eta\) too small, it's stepping will maybe trap in local minima, computational expensive because it take a lot of time with so many epochs to converge
- If we choose \(\eta\) to be "nice", that's cool, our model work well
- If we choose \(\eta\) too large, it's stepping will be overshooting over the global minima, our model will update weight forever until reach maxima epoch because the error never go down
No comments :
Post a Comment
Leave a Comment...