1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Forward Propagation
Alright, it's very easy to understand forward propagation since we already went through the basic Single Neural Network, if you're interesting this topic and want to read it immediately, don't worried, because forward propagation is so easy, it require nothing to understand. Okay, basically it's still a linear combination of the input and weight, since the neuron in Multi-layer Neural Network is fully connected together, therefore the output of the previous layer is the input of next layer, so, it's always better to visualize something by the illustration before going to their mathematics, right? And there it is,Remember, Multi-layer Neural Network just have 3 layers and agree with me that there is no connection between biases of two layers, right? So look at the image above, beginning in the input layer, we start to calculate the net input of neuron \(a^{(2)}_1\),
\(z^{(2)}_1 = w^{(1)}_{1,0}x_0 + w^{(1)}_{1,1}x_1 + w^{(1)}_{1,2}x_2\)
\(a^{(2)}_1 = \phi\left(z^{(2)}_1\right) \hspace{1cm} (8)\)
Where,- \(z^{(2)}_1\) is the net input of neuron 1 in layer 2
- \(w^{(1)}_{1,0}\) is the weight between neuron 1 in layer 2 and input \(x_0\) in layer \(1\)
- \(w^{(1)}_{1,1}\) is the weight between neuron 1 in layer 2 and input \(x_1\) in layer \(1\)
- \(w^{(1)}_{1,2}\) is the weight between neuron 1 in layer 2 and input \(x_2\) in layer \(1\)
- \(a^{(2)}_1\) is the output of neuron 1 in layer 2
- \(\phi()\) is the activation function of neuron 1 in layer 2
\(\phi\left(z^{(2)}_1\right) = \frac{1}{1 + e^{-z^{(2)}_1}}\)
Here is the graph of Sigmoid function, basically, it squash input value in range 0.0 to 1.0 and the center of Sigmoid function is 0.5 when input value is 0,So, here we just calculate the net input and activation function of one neuron \(a^{(2)}_1\), we must calculate all of neurons except input neurons because input neurons belong to raw data and the bias neurons. For the convienent when implement, we also will use the matrix-vector representation,
\(\mathbf{z^{(2)} = w^{(1)}x^{(1)}}\)
\(\mathbf{a^{(2)} = \phi\left(z^{(2)}\right)}\)
Where,
- \(\mathbf{x^{(1)}}\) is
[m + 1] x 1
dimensional features vector, whilem
is the number of features plus bias. - \(\mathbf{w^{(1)}}\) is
h x [m + 1]
dimensional weight matrix, whileh
is the number of hidden layer. - \(\mathbf{z^{(2)}}\) is
h x 1
dimensional vector, because \(\mathbf{z^{(2)} = w^{(1)}x^{(1)}}\), soh x [m + 1]
dimensional matrix multiple[m + 1] x 1
dimensional features vector, we will obtainh x 1
dimensional vector. If you don't know why, you can reference this page Matrix vector multiplication. - \(\mathbf{a^{(2)}}\) is the activation function of \(\mathbf{z^{(2)}}\), then plus one bias, therefore \(\mathbf{a^{(2)}}\) is a
[h + 1] x 1
dimensional vector
\(\mathbf{Z^{(2)} = W^{(1)}X^{(1)}}\)
\(\mathbf{A^{(2)} = \phi\left(Z^{(2)}\right)}\)
Where,
- \(\mathbf{X^{(1)}}\) is
n x [m + 1]
dimensional features matrix, while rown
is the number of all training samples and columnm
is the number of features plus bias. - \(\mathbf{W^{(1)}}\) is
h x [m + 1]
dimensional weight matrix, whileh
is the number of hidden layer. - \(\mathbf{Z^{(2)}}\) is
h x n
dimensional matrix, because \(\mathbf{Z^{(2)} = W^{(1)}X^{(1)}}\), soh x [m + 1]
dimensional weight matrix multiplyn x [m + 1]
dimensional features matrix, we will obtainh x n
dimensional matrix. - \(\mathbf{A^{(2)}}\) is the is the activation function of \(\mathbf{Z^{(2)}}\), then plus one bias, therefore \(\mathbf{A^{(2)}}\) is a
[h + 1] x n
dimensional matrix.
[m + 1] x n
dimension, then h x [m + 1] dimension multiply [m + 1] x n
dimension we will have the correct result h x n
dimensional matrix.So, we almost done, do similarly with the output layer, we have
\(\mathbf{Z^{(3)} = W^{(2)}A^{(2)}}\)
\(\mathbf{A^{(3)} = \phi\left(Z^{(3)}\right)}\)
Where,
- \(\mathbf{W^{(2)}}\) is
t x [h + 1]
dimensional weight matrix, whilet
is the number of output layer,h + 1
is the number of hidden layer plus bias. - \(\mathbf{Z^{(3)}}\) is
t x n
dimensional matrix, because \(\mathbf{Z^{(3)} = W^{(2)}A^{(2)}}\), sot x [h + 1]
dimensional weight matrix multiply[h + 1] x n
dimensional output matrix, we will obtaint x n
dimensional matrix. - \(\mathbf{A^{(3)}}\) is the same dimension to \(\mathbf{Z^{(3)}}\), because we just mapping \(\mathbf{Z^{(3)}}\) through Sigmoid function to obtain \(\mathbf{A^{(3)}}\).
No comments :
Post a Comment
Leave a Comment...