The math behind Perceptron rule

Thursday, June 1, 2017

1. Introduction
2. History and Overview about Artificial Neural Network

3. Single neural network

3.1 Perceptron

3.2 Adaptive Linear Neurons

3.3 Problems with Perceptron (AI Winter)

4. Multi-layer neural network

4.1 Overview about Multi-layer Neural Network
4.2 Forward Propagation
4.3 Cost function
4.4 Backpropagation
4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron

4.6 Some optional techniques for Multi-layer Neural Network Optimization

4.7 Multi-layer Neural Network for binary/multi classification

5. Install and using Multi-layer Neural Network to classify MNIST data

5.1 Overview about MNIST data
5.2 Implement Multi-layer Neural Network
5.3 Debugging Neural Network with Gradient Descent Checking

6. Summary
7. References

The Perceptron rules

The Perceptron rule is the "heart" that cause Perceptron learning with 2 easy steps:

Initial small weight equal 0 or randomly between -1 and 1
With each trainning sample $x^{(i)}$ :

Calculate the error between target ouput and actual ouput
Updates the weight

So, we will discuss detail about Perceptron rule. Firstly, what is the main ideas of Perceptron learning rule? Look at steps of Perceptron rule is that it try to minimize the error by adjusted the weight so the next time, we hope the error will decrease close to 0, right? As we mentioned before, with each training sample $x^{(i)}$, update the weight of the each neuron $j^{th}$ the formula,

$$w^{(i)}_j = w^{(i)}_j + \Delta w^{(i)}$$

Where, $ \Delta w^{(i)} = \eta (target^{(i)} - output^{(i)})x^{(i)}_j\hspace{1cm}(2)$

In this equation $(2)$, where

$\eta$ is the learning rate, it's value from 0.0 to 1.0
$target^{(i)}$ is the true class
$output^{(i)}$ is the current predicted class
$x^{(i)}_j$ is the input of training sample $i^{th}$ of neuron $j^{th}$

I assume you're very curious enthusiastic people, you want to dig deeper about the equation of Perceptron learning rule. You maybe wonder

What is the learning rate $\eta$ and why $\eta$ is important?

Why we have the term $(target^{(i)} - output^{(i)})$?

Why we multiply by extra input $x^{(i)}_j$?

We'll discuss each question right now, let's go,

What is the learning rate $\eta$ and why $\eta$ is important?

Why we have the term $(target^{(i)} - output^{(i)})$?

If $E^{(i)} = 0$, it means that $target^{(i)} = output^{(i)}$, so nothing changes, our Perceptron work properly
If $E^{(i)} < 0$, it means that $target^{(i)} = -1$, but $output^{(i)} = 1$, then $E = -2$, therefore we want to decrease $w$ and the next time, $output^{(i)}$ will decrease close to $-1$
If $E^{(i)} > 0$, it means that $target^{(i)} = 1$, but $output^{(i)} = -1$, then $E = +2$, therefore we want to increase $w$ and the next time, $output^{(i)}$ will increase close to $1$

Why we multiply by extra input $x^{(i)}_j$?

Assume input $x$ is POSITIVE

If $E < 0$, we already calculated $E = -2$, so we want to decrease $w$, right? $w = w + \Delta w$, $\Delta w = E = -2 $, so $w = w - 2 $ ($w$ decreased)
If $E > 0$, we already calculated $E = +2$, so we want to increase $w$, right? $w = w + \Delta w$, $\Delta w = E = +2 $, so $w = w + 2 $ ($w$ increased)

But what if input $x$ is NEGATIVE?

Increase $w$ will decrease the net input $z$, so that $g(z)$ wil close to -1, but what we want is 1, what's wrong? Why? Because when
$x < 0, w$ increase $\Rightarrow w_jx_j$ is big negative number, when we sum, the net input $z = \mathbf{w^Tx}$ with have big negative number in it, defintely decrease.
Similarly, decrease $w$ will increase the net input $z$.

The math behind Perceptron rule

The Perceptron rules

No comments :

Post a Comment