TrisZaska's Machine Learning Blog

Understanding the Logistic cost function in Neural Network

1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
4. Multi-layer neural network
5. Install and using Multi-layer Neural Network to classify MNIST data
6. Summary
7. References

Logistic cost function

Firstly, let's define what is the cost function? In the field Machine Learning, the cost function is a function that returns a real number represent how well our models performed, it's also called loss or error function, therefore, it's as close to 0 as possible. So, in the subfield Neural Network, we need to optimize the weights to achieve minimal cost function, right? There are many cost function out there, but assume our Neural Network for Classification purpose when we actually implement in sector 5, we just discuss one of them so-called Logistic cost function, it's a specific case of Logistic function with the formula,

\(J(w) = -\sum_{i = 1}^{n}y^{(i)}\log\left(a^{(i)}\right) + \left(1 - y^{(i)}\right)\log\left(1 - a^{(i)}\right) \hspace{1cm} (9)\)
Where,
  • \(y^{(i)}\) is the target output vector (true label)
  • \(a^{(i)}\) is the actual output vector (predicted label)
  • \(i^{th}\) is the \(i^{th}\) unit
So, why we have the equation \((9)\)? What is the idea behind it? Basically, the equation \((9)\) is the combination of two equations,
\begin{equation} \begin{cases} -\log\left(a^{(i)}\right) & \text{if } y^{(i)} = 1\\ -\log\left(1 - a^{(i)}\right) & \text{if } y^{(i)} = 0 \end{cases} \end{equation}Here is the illustrated idea behind two the equation,
As we can see, when \(y^{(i)} = 1\), if \(a^{(i)} = 1\), \(J(w)\) is very close to 0.0, right? if \(a^{(i)} = 0\), \(J(w)\) is very large. It's similarity when \(y^{(i)} = 0\). Then, our mission is try to minimize this cost function to achieve correct output.
Again, here we only calculate the cost function of one neuron in output layer, right? But it's have many neurons in output layer in case multi-calssification we must calculate all of them. So let's re-define our cost function formula,

\(J(W) = -\sum_{i = 1}^{n}\sum_{j = 1}^{t}y^{(i)}_j\log \left( a^{(i)}_j\right) + \left(1 - y^{(i)}_j\right)\log\left(1 - a^{(i)}_j\right)\)
Where,
  • \(t\) is the number of neurons in the output layer
  • \(y^{(i)}_j\) is the target output matrix with one-hot representation we will discuss later
  • \(a^{(i)}_j\) is the actual output matrix in training sample \(i^{th}\) of neuron \(j^{th}\)
We also add the Regularization term in the cost function to prevent over-fitting that we will discuss very detail in the topic of Regularization. So, finally, do you remember our mission? That's we try to minimize our cost function by calculate the partial derivative of our logistic cost function with respect to each \(w\) (weight) in our network, why we do that? If you're familiar with caculus, everything is okay, if you're not yet, don't worry you just make sense the idea "Partial derivative is rate of change, so we need to use partial derivative to see how much we change in the weight will affect the error",

\(\frac{\partial J(W)}{\partial W^{(l)}_{i,j}}\)
Where,
  • \(\partial\) is the sign of Partial Derivative
  • \(l\) is the \(l^{th}\) layer
  • \(W^{(l)}_{i,j}\) is the weight matrix between of \(i^{th}\) neuron in layer \(l\) and \(j^{th}\) neuron in layer \(l + 1\)
So, we done step 2 and move on to the final step, in the next section of Backpropagation, we'll discuss more detail how to use the definition of Backpropagation.

No comments :

Post a Comment

Leave a Comment...