1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.6.1 Adaptive Learning Rate (Annealing)
- 4.6.2 Momentum Terms
- 4.6.3 Regularization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Adaptive Learning Rate
In practice with complex problems, often we not use fixed learning rate, especially when training Neural Network with Mini-batch Gradient Descent, the error surface was very noisy because we just use the fraction of the whole dataset in every epoch, right? With fixed learning rate, which causes our model too much "energy" when we try to "climb down the hill" on the error surface, therefore it's can be overshooting the optimal. Because of that, Annealing learning rate (or learning rate decay) came to help us.So, what is Annealing learning rate? It's a technique that allow us to decrease the learning rate \(\eta\) after every epoch, it means that \(\eta^{(i)} > \eta^{(i + 1)}\) with \(i\) is the \(i^{th}\) epoch. Let's take a look,
- Declines the learning rate after \(m\) epochs, where \(m\) is the number of epochs E.g: After every 20 epochs, we decrease learning rate once with pre-defined value such as half or 0.03, etc.
- Declines the learning rate in every epoch:
- Using Exponiental Decay with the formula,
- \(c\) is constant value
- \(i\) is the \(i^{th}\) epoch
- Using Fraction Decay with the formula,
- \(\eta = \eta e^{-ci}\)
- Where,
- \(\eta = \frac{\eta}{1 + ci}\)
No comments :
Post a Comment
Leave a Comment...