TrisZaska's Machine Learning Blog

Overview about MNIST data

1. Introduction
2. History and Overview about Artificial Neural Network
3. Single neural network
4. Multi-layer neural network
5. Install and using Multi-layer Neural Network to classify MNIST data
6. Summary
7. References

Overview about MNIST data

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits, it was collected by Yann LeCun, Corinna Cortes, Christopher J.C. Burges consists of 70.000 handwritten digits 28x28 pixels written by 500 different writers. It's divided into 2 components include 60.000 handwritten digits for training and 10.000 handwritten digits for testing. It's available on Yann's page or you can download directly from here,
Until now, the applied Convolution Neural Network on this data with the lowest error is 0.21, that's really cool. So in this final excercise, we actually play some codes to install "naive" Multi-layer Neural Network to learn and predict based-on this data. Since this data is transform to binary format we can not open with regular application, but can read by some simple Python codes. Let's take a look what kind of data that we will work on,
First 25 images of MNIST data

How to load MINIST data?

 
Before installing Neural Network, firstly let's write some codes to get MNIST data into features train, features test, labels train and labels test.
### Import some needed libraries Some notes maybe are useful for you to not struggle with the snippet codes below:
  • "I" is unsinged integer equal 4 bytes. We need "IIII"(16 bytes) to read the descriptions of the image dataset and "II"(8 bytes) to read the descriptions of the label dataset
  • ">" is the Big-endian, if you don't know what Big-endian is? Let's take a look on Wikipedia page
  • "data_description" is the tuple contain description of data: data_description = (magic number, number of images, rows, columns)
  • ".read(16)" method in load images data used to read bytes begin at offset 0016 to the end of file and it's similar to load labels data with .read(8)
  • "dtype=np.uint8" used to determined the size of byte-order and np.int8 = 1 byte. Check some other data types of numpy on this page
### Firstly, let’s define a function allow us to push folder contain data and return to us features_train, features_test, labels_train and lables_test ### Now import your personal folder in your computer, remember folder must contain 4 data files Alright, we done.

No comments :

Post a Comment

Leave a Comment...