1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Learning via types of Gradient Descent
We've already implemented Perceptron and Adaline models. It's a little bit different out there when we update \(w\),Look again the equation \((2)\) implement Perceptron rule,
\(\Delta w^{(i)} = \eta (target^{(i)} - output^{(i)})x^{(i)}_j\)
and the equation \((6)\) implement Delta rule,
\(\Delta w^{(i)} = \eta \sum_i(y^{(i)} - \phi(z^{(i)}))x^{(i)}_j\)
Can you see the different between two equations? In Perceptron, we actually update the weight incrementally each training samples in each epoch, that's so-called Stochastic Gradient Descent (or Online Learning). In the other hand, Adaline using the whole training data set in each epoch to calculate and update the weight, that's so-called Batch Gradient Descent.So, what is the advantages and disadvantages of them?
- Imaging you're AI Engineer, you're working in Google about apply Machine Learning on the server of Youtube which serves billion of users every day. What do you think? I don't know what you think, but surely it's very challenges, hm !!! So, with large huge data sets, if you training whole of them in every epoch, repeat again and again to minimize error, it definitely takes huge memory, executes time and computational very expensive. Then, we'll come to the solution is that, instead of load whole data in every epoch, we load one by one sample to update weight, its mean that Stochastic Gradient Descent can come to help.
- Another advantage is that Online Learning can update weight immediately when another data samples come in. It's very useful if you're working on web application interact with user in real time.
- Stochastic Gradient Descent maybe optimize the surface error noisier than Batch Gradient Descent and it does not reach global optimal, but don't worried, although it does not reach global minimum it's valued very close to the global minimum.
for
" loops in Online learning or must load whole dataset in Batch Gradient Descent. Mini-batch Gradient Descent allows us to use the subset of whole training dataset, therefore we can vectorize it. If \(k\) is the subset of training samples,
\(1 < k < m\)
Where \(m\) is the whole training samples.
No comments :
Post a Comment
Leave a Comment...