1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Forward Propagation
Alright, it's very easy to understand forward propagation since we already went through the basic Single Neural Network, if you're interesting this topic and want to read it immediately, don't worried, because forward propagation is so easy, it require nothing to understand. Okay, basically it's still a linear combination of the input and weight, since the neuron in Multi-layer Neural Network is fully connected together, therefore the output of the previous layer is the input of next layer, so, it's always better to visualize something by the illustration before going to their mathematics, right? And there it is,Remember, Multi-layer Neural Network just have 3 layers and agree with me that there is no connection between biases of two layers, right? So look at the image above, beginning in the input layer, we start to calculate the net input of neuron \(a^{(2)}_1\),
\(z^{(2)}_1 = w^{(1)}_{1,0}x_0 + w^{(1)}_{1,1}x_1 + w^{(1)}_{1,2}x_2\)
\(a^{(2)}_1 = \phi\left(z^{(2)}_1\right) \hspace{1cm} (8)\)
Where,- \(z^{(2)}_1\) is the net input of neuron 1 in layer 2
- \(w^{(1)}_{1,0}\) is the weight between neuron 1 in layer 2 and input \(x_0\) in layer \(1\)
- \(w^{(1)}_{1,1}\) is the weight between neuron 1 in layer 2 and input \(x_1\) in layer \(1\)
- \(w^{(1)}_{1,2}\) is the weight between neuron 1 in layer 2 and input \(x_2\) in layer \(1\)
- \(a^{(2)}_1\) is the output of neuron 1 in layer 2
- \(\phi()\) is the activation function of neuron 1 in layer 2
\(\phi\left(z^{(2)}_1\right) = \frac{1}{1 + e^{-z^{(2)}_1}}\)
Here is the graph of Sigmoid function, basically, it squash input value in range 0.0 to 1.0 and the center of Sigmoid function is 0.5 when input value is 0,So, here we just calculate the net input and activation function of one neuron \(a^{(2)}_1\), we must calculate all of neurons except input neurons because input neurons belong to raw data and the bias neurons. For the convienent when implement, we also will use the matrix-vector representation,
\(\mathbf{z^{(2)} = w^{(1)}x^{(1)}}\)
\(\mathbf{a^{(2)} = \phi\left(z^{(2)}\right)}\)
Where,
- \(\mathbf{x^{(1)}}\) is
[m + 1] x 1dimensional features vector, whilemis the number of features plus bias. - \(\mathbf{w^{(1)}}\) is
h x [m + 1]dimensional weight matrix, whilehis the number of hidden layer. - \(\mathbf{z^{(2)}}\) is
h x 1dimensional vector, because \(\mathbf{z^{(2)} = w^{(1)}x^{(1)}}\), soh x [m + 1]dimensional matrix multiple[m + 1] x 1dimensional features vector, we will obtainh x 1dimensional vector. If you don't know why, you can reference this page Matrix vector multiplication. - \(\mathbf{a^{(2)}}\) is the activation function of \(\mathbf{z^{(2)}}\), then plus one bias, therefore \(\mathbf{a^{(2)}}\) is a
[h + 1] x 1dimensional vector
\(\mathbf{Z^{(2)} = W^{(1)}X^{(1)}}\)
\(\mathbf{A^{(2)} = \phi\left(Z^{(2)}\right)}\)
Where,
- \(\mathbf{X^{(1)}}\) is
n x [m + 1]dimensional features matrix, while rownis the number of all training samples and columnmis the number of features plus bias. - \(\mathbf{W^{(1)}}\) is
h x [m + 1]dimensional weight matrix, whilehis the number of hidden layer. - \(\mathbf{Z^{(2)}}\) is
h x ndimensional matrix, because \(\mathbf{Z^{(2)} = W^{(1)}X^{(1)}}\), soh x [m + 1]dimensional weight matrix multiplyn x [m + 1]dimensional features matrix, we will obtainh x ndimensional matrix. - \(\mathbf{A^{(2)}}\) is the is the activation function of \(\mathbf{Z^{(2)}}\), then plus one bias, therefore \(\mathbf{A^{(2)}}\) is a
[h + 1] x ndimensional matrix.
[m + 1] x n dimension, then h x [m + 1] dimension multiply [m + 1] x n dimension we will have the correct result h x n dimensional matrix.So, we almost done, do similarly with the output layer, we have
\(\mathbf{Z^{(3)} = W^{(2)}A^{(2)}}\)
\(\mathbf{A^{(3)} = \phi\left(Z^{(3)}\right)}\)
Where,
- \(\mathbf{W^{(2)}}\) is
t x [h + 1]dimensional weight matrix, whiletis the number of output layer,h + 1is the number of hidden layer plus bias. - \(\mathbf{Z^{(3)}}\) is
t x ndimensional matrix, because \(\mathbf{Z^{(3)} = W^{(2)}A^{(2)}}\), sot x [h + 1]dimensional weight matrix multiply[h + 1] x ndimensional output matrix, we will obtaint x ndimensional matrix. - \(\mathbf{A^{(3)}}\) is the same dimension to \(\mathbf{Z^{(3)}}\), because we just mapping \(\mathbf{Z^{(3)}}\) through Sigmoid function to obtain \(\mathbf{A^{(3)}}\).


No comments :
Post a Comment
Leave a Comment...