TrisZaska's Machine Learning Blog

What is the problems of Perceptron that caused AI Winter?

1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
4. Multi-layer neural network
5. Install and using Multi-layer Neural Network to classify MNIST data
6. Summary
7. References

Problems with Perceptron (AI Winter)

After two exercises, as you can saw, there is similarity of two datasets we used, right? If you look at the labels class \(y\), you wonder why labels class go from 0 to 1 but not shuffle? This is an instance of Linearly Seperable, so what is it and why Single Neural Network can work on linear data and can not work on the non-linear data?
Firstly, let's define linear and non-linear. According to Wikipedia, Linearity is the property of a mathematical relationship or function which means that it can be graphically represented as a straight line and non-linearity is vice versa, it's very clear, right? So the linear data is that the data can be separated into two classes with a straight line, if not, it's a non-linear data,
So, if you want to know more about Linear Separability. This Wikipedia page is for you, it's very intuition and clear to understand. Come back to Perceptron, to answer for why Perpceptron can just work on linear data, we firstly look at the equation $(1)$ of Perceptron,
\(z = w_0x_0 + w_1x_1 + w_2x_2 + ... + w_mx_m = \sum_{j = 0}^{m} w_jx_j\)
Typically, it's a linear combination itself, in fact, we try to split our data into 2 classes using hyperplane \(z\). For simplicity, assume in 2D with just two features, \(W = (a, b)\) and \(X = (x, y)\). So, it will be 1 if \(WX \geqslant 0\) and vice versa, it can be written as,
\(WX = ax + by \)
\(WX \geqslant 0 \Leftrightarrow ax + by \geqslant 0\)
\(\Leftrightarrow y \geqslant \frac{-ax}{b} \hspace{1cm}(8)\)
This equation \((8)\) is a linear equation, then it only can be solved in linear problems.
Alright, let's go back to the year 1970s with XOR problems that caused AI "Winter". We will do some simple mathematics to understand this problems,
Look at the table from top down, we calculate the net input \(z = w^Tx\) to see what's happen,
As we can see, in the fourth equation, it's a contradiction with the second and third equations, therefore, it's can not be solved by Perceptron. This problem caused the vanish of Neural Network for 15 years, then it renewed with Multi-layer Neural Network not only solved this problem but also make the background for Deep learning with very compromise today, so we're going to discuss together in the next topic.

No comments :

Post a Comment

Leave a Comment...