1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
7. References
2. History and Overview about Artificial Neural Network
3. Single neural network
- 3.1 Perceptron
- 3.1.1 The Unit Step function
- 3.1.2 The Perceptron rules
- 3.1.3 The bias term
- 3.1.4 Implement Perceptron in Python
- 3.2 Adaptive Linear Neurons
- 3.2.1 Gradient Descent rule (Delta rule)
- 3.2.2 Learning rate in Gradient Descent
- 3.2.3 Implement Adaline in Python to classify Iris data
- 3.2.4 Learning via types of Gradient Descent
- 3.3 Problems with Perceptron (AI Winter)
- 4.1 Overview about Multi-layer Neural Network
- 4.2 Forward Propagation
- 4.3 Cost function
- 4.4 Backpropagation
- 4.5 Implement simple Multi-layer Neural Network to solve the problem of Perceptron
- 4.6 Some optional techniques for Multi-layer Neural Network Optimization
- 4.7 Multi-layer Neural Network for binary/multi classification
- 5.1 Overview about MNIST data
- 5.2 Implement Multi-layer Neural Network
- 5.3 Debugging Neural Network with Gradient Descent Checking
7. References
Problems with Perceptron (AI Winter)
After two exercises, as you can saw, there is similarity of two datasets we used, right? If you look at the labels class \(y\), you wonder why labels class go from 0 to 1 but not shuffle? This is an instance of Linearly Seperable, so what is it and why Single Neural Network can work on linear data and can not work on the non-linear data?Firstly, let's define linear and non-linear. According to Wikipedia, Linearity is the property of a mathematical relationship or function which means that it can be graphically represented as a straight line and non-linearity is vice versa, it's very clear, right? So the linear data is that the data can be separated into two classes with a straight line, if not, it's a non-linear data,
So, if you want to know more about Linear Separability. This Wikipedia page is for you, it's very intuition and clear to understand. Come back to Perceptron, to answer for why Perpceptron can just work on linear data, we firstly look at the equation $(1)$ of Perceptron,
\(z = w_0x_0 + w_1x_1 + w_2x_2 + ... + w_mx_m = \sum_{j = 0}^{m} w_jx_j\)
Typically, it's a linear combination itself, in fact, we try to split our data into 2 classes using hyperplane \(z\). For simplicity, assume in 2D with just two features, \(W = (a, b)\) and \(X = (x, y)\). So, it will be 1 if \(WX \geqslant 0\) and vice versa, it can be written as,
\(WX = ax + by \)
\(WX \geqslant 0 \Leftrightarrow ax + by \geqslant 0\)
\(\Leftrightarrow y \geqslant \frac{-ax}{b} \hspace{1cm}(8)\)
This equation \((8)\) is a linear equation, then it only can be solved in linear problems.Alright, let's go back to the year 1970s with XOR problems that caused AI "Winter". We will do some simple mathematics to understand this problems,
Look at the table from top down, we calculate the net input \(z = w^Tx\) to see what's happen,
As we can see, in the fourth equation, it's a contradiction with the second and third equations, therefore, it's can not be solved by Perceptron. This problem caused the vanish of Neural Network for 15 years, then it renewed with Multi-layer Neural Network not only solved this problem but also make the background for Deep learning with very compromise today, so we're going to discuss together in the next topic.
No comments :
Post a Comment
Leave a Comment...