1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Logistic cost function
Firstly, let's define what is the cost function? In the field Machine Learning, the cost function is a function that returns a real number represent how well our models performed, it's also called loss or error function, therefore, it's as close to 0 as possible. So, in the subfield Neural Network, we need to optimize the weights to achieve minimal cost function, right? There are many cost function out there, but assume our Neural Network for Classification purpose when we actually implement in sector 5, we just discuss one of them so-called Logistic cost function, it's a specific case of Logistic function with the formula,
\(J(w) = -\sum_{i = 1}^{n}y^{(i)}\log\left(a^{(i)}\right) + \left(1 - y^{(i)}\right)\log\left(1 - a^{(i)}\right) \hspace{1cm} (9)\)
Where,
- \(y^{(i)}\) is the target output vector (true label)
- \(a^{(i)}\) is the actual output vector (predicted label)
- \(i^{th}\) is the \(i^{th}\) unit
\begin{equation} \begin{cases} -\log\left(a^{(i)}\right) & \text{if } y^{(i)} = 1\\ -\log\left(1 - a^{(i)}\right) & \text{if } y^{(i)} = 0 \end{cases} \end{equation}Here is the illustrated idea behind two the equation,
As we can see, when \(y^{(i)} = 1\), if \(a^{(i)} = 1\), \(J(w)\) is very close to 0.0, right? if \(a^{(i)} = 0\), \(J(w)\) is very large. It's similarity when \(y^{(i)} = 0\). Then, our mission is try to minimize this cost function to achieve correct output.
Again, here we only calculate the cost function of one neuron in output layer, right? But it's have many neurons in output layer in case multi-calssification we must calculate all of them. So let's re-define our cost function formula,
\(J(W) = -\sum_{i = 1}^{n}\sum_{j = 1}^{t}y^{(i)}_j\log \left( a^{(i)}_j\right) + \left(1 - y^{(i)}_j\right)\log\left(1 - a^{(i)}_j\right)\)
Where,
- \(t\) is the number of neurons in the output layer
- \(y^{(i)}_j\) is the target output matrix with one-hot representation we will discuss later
- \(a^{(i)}_j\) is the actual output matrix in training sample \(i^{th}\) of neuron \(j^{th}\)
\(\frac{\partial J(W)}{\partial W^{(l)}_{i,j}}\)
Where,
- \(\partial\) is the sign of Partial Derivative
- \(l\) is the \(l^{th}\) layer
- \(W^{(l)}_{i,j}\) is the weight matrix between of \(i^{th}\) neuron in layer \(l\) and \(j^{th}\) neuron in layer \(l + 1\)
No comments :
Post a Comment
Leave a Comment...