Understanding the Logistic cost function in Neural Network

Tuesday, June 6, 2017

1. Introduction
2. History and Overview about Artificial Neural Network

3. Single neural network

3.1 Perceptron

3.2 Adaptive Linear Neurons

3.3 Problems with Perceptron (AI Winter)

4. Multi-layer neural network

4.1 Overview about Multi-layer Neural Network
4.2 Forward Propagation
4.3 Cost function
4.4 Backpropagation
4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron

4.6 Some optional techniques for Multi-layer Neural Network Optimization

4.7 Multi-layer Neural Network for binary/multi classification

5. Install and using Multi-layer Neural Network to classify MNIST data

5.1 Overview about MNIST data
5.2 Implement Multi-layer Neural Network
5.3 Debugging Neural Network with Gradient Descent Checking

6. Summary
7. References

Logistic cost function

Firstly, let's define what is the cost function? In the field Machine Learning, the cost function is a function that returns a real number represent how well our models performed, it's also called loss or error function, therefore, it's as close to 0 as possible. So, in the subfield Neural Network, we need to optimize the weights to achieve minimal cost function, right? There are many cost function out there, but assume our Neural Network for Classification purpose when we actually implement in sector 5, we just discuss one of them so-called Logistic cost function, it's a specific case of Logistic function with the formula,

\(J(w) = -\sum_{i = 1}^{n}y^{(i)}\log\left(a^{(i)}\right) + \left(1 - y^{(i)}\right)\log\left(1 - a^{(i)}\right) \hspace{1cm} (9)\)

Where,

\(y^{(i)}\) is the target output vector (true label)
\(a^{(i)}\) is the actual output vector (predicted label)
\(i^{th}\) is the \(i^{th}\) unit

So, why we have the equation \((9)\)? What is the idea behind it? Basically, the equation \((9)\) is the combination of two equations,
\begin{equation} \begin{cases} -\log\left(a^{(i)}\right) & \text{if } y^{(i)} = 1\\ -\log\left(1 - a^{(i)}\right) & \text{if } y^{(i)} = 0 \end{cases} \end{equation}Here is the illustrated idea behind two the equation,

As we can see, when \(y^{(i)} = 1\), if \(a^{(i)} = 1\), \(J(w)\) is very close to 0.0, right? if \(a^{(i)} = 0\), \(J(w)\) is very large. It's similarity when \(y^{(i)} = 0\). Then, our mission is try to minimize this cost function to achieve correct output.
Again, here we only calculate the cost function of one neuron in output layer, right? But it's have many neurons in output layer in case multi-calssification we must calculate all of them. So let's re-define our cost function formula,

\(J(W) = -\sum_{i = 1}^{n}\sum_{j = 1}^{t}y^{(i)}_j\log \left( a^{(i)}_j\right) + \left(1 - y^{(i)}_j\right)\log\left(1 - a^{(i)}_j\right)\)

Where,

\(t\) is the number of neurons in the output layer
\(y^{(i)}_j\) is the target output matrix with one-hot representation we will discuss later
\(a^{(i)}_j\) is the actual output matrix in training sample \(i^{th}\) of neuron \(j^{th}\)

We also add the Regularization term in the cost function to prevent over-fitting that we will discuss very detail in the topic of Regularization. So, finally, do you remember our mission? That's we try to minimize our cost function by calculate the partial derivative of our logistic cost function with respect to each \(w\) (weight) in our network, why we do that? If you're familiar with caculus, everything is okay, if you're not yet, don't worry you just make sense the idea "Partial derivative is rate of change, so we need to use partial derivative to see how much we change in the weight will affect the error",

\(\frac{\partial J(W)}{\partial W^{(l)}_{i,j}}\)

Where,

\(\partial\) is the sign of Partial Derivative
\(l\) is the \(l^{th}\) layer
\(W^{(l)}_{i,j}\) is the weight matrix between of \(i^{th}\) neuron in layer \(l\) and \(j^{th}\) neuron in layer \(l + 1\)

So, we done step 2 and move on to the final step, in the next section of Backpropagation, we'll discuss more detail how to use the definition of Backpropagation.

Understanding the Logistic cost function in Neural Network

Logistic cost function

No comments :

Post a Comment