In the previous exercise we set up a Multilayer Perceptron to model the Iris Dataset. Here we consider the [MNIST handwritten digits database](https://en.wikipedia.org/wiki/MNIST_database) to investigate further aspects of setting up Neural Networks. This is a collection of 60,000 training images and 10,000 testing images, which have been digitised onto a 28x28 grid.
%% Cell type:markdown id: tags:
First we import Tensorflow and Keras.
%% Cell type:code id: tags:
``` python
importtensorflowastf
fromtensorflowimportkeras
```
%% Cell type:markdown id: tags:
We import from `keras` a built-in function to load the MNIST data set. From `keras` we introduce the functions `Sequential`, which creates a feed-forward neural network, `Dense` which creates a fully connected network and `Activation`, which introduces different activation functions. We also introduce `SGD` which implements stochastic gradient descent.
We first load the MNIST. This comes already split into a training and testing set. The features are black and white images which are pixellated onto a 28x28 grid, with intensities ranging from 0 to 255. The targets values are the digit (0-9) that each image corresponds to.
This is now split into a training set and a validation set, and the features are normalized so that the pixel values ranges from 0 to 1.
%% Cell type:code id: tags:
``` python
X_valid=X_train_full[:5000]/255.0
X_train=X_train_full[5000:]/255.0
y_valid=y_train_full[:5000]
y_train=y_train_full[5000:]
```
%% Cell type:markdown id: tags:
## Sequential API
%% Cell type:markdown id: tags:
Previously we used a single call to initialize the Neural Network, however a cleaner way to do this is to initialize the model and then use the `model.add()` method to sequentially add layers. We need to specify the input shape is a 28x28 matrix, and that the output corresponds to 10 possible categories. The number of hidden layers and the number of neurons in each hidden layer is then arbitrary.
There is also a Functional API to create models, which allows the creation of more complex Neural Networks. This will be discussed in later notebooks.
%% Cell type:code id: tags:
``` python
model=Sequential()
model.add(Flatten(input_shape=[28,28]))
model.add(Dense(300,activation="relu"))
model.add(Dense(100,activation="relu"))
model.add(Dense(10,activation="softmax"))
model.summary()
```
%% Cell type:code id: tags:
``` python
model.layers
```
%%%% Output: execute_result
[<keras.layers.core.Flatten at 0x7f91439305d0>,
<keras.layers.core.Dense at 0x7f91439307d0>,
<keras.layers.core.Dense at 0x7f9140a42a90>,
<keras.layers.core.Dense at 0x7f91437a0790>]
%% Cell type:markdown id: tags:
To compile the model we use the loss function `sparse_categorical_crossentropy`, as we have mutually exclusive target values for the variables ranging from 0-9. For the Iris data set we used one-hot encoding which created a binary table with each column corresponding to whether or not that instance corresponds to a particular species. In that case the appropriate loss function is `categorical_crossentropy`.
We will use SGD with the default learning rate of `lr=0.01` and decay rate of `decay=0`.
Finally we can train the model. In this example we use only 50 epochs (iterations) and use explicit validation datasets. Previously, we specified what proportion of the training set should be used for validation. Due to the large size of the input data (784 features compared with 4 features for the Iris data), training takes significantly longer. Note that for each epoch only 32 randomly chosen instances (the default batch size) are chosen for training the model.
The loss and accuracy of the training and validation sets can now be plotted as a function of the epochs of the model. The behaviour looks reasonable where there is no significant difference between the testing and validation data, and the accuracy is generally monotonically increasing, while the loss is monotonically decreasing.
%% Cell type:code id: tags:
``` python
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.xlabel('Epochs')
plt.title('Evolution of sequential neural network for MNIST');
Finally we can use the testing set to determine the accuracy of the model. This can be done using the `evaluate` function for the model. As can be seen the accuracy for the test set is consistent with the accuracy of the validation set.
%% Cell type:code id: tags:
``` python
model.evaluate(X_test,y_test)
```
%%%% Output: execute_result
[13.572843551635742, 0.9785000085830688]
%% Cell type:markdown id: tags:
We can also investigate other measures of the accuracy using the actual classifications of the model. The model predicts the probability of a particular classification. To find the actual classification we then just need to find the column in each row with the maximum probability using the function `argmax()`. The precision gives an indication of what percentage of the predictions of that species are correct, while the recall gives an indication of what percentage of the actual samples for that species are predicted correctly. The F1-score gives a weighted average of the precision and recall. For a perfect model all of these would be 1.
The support is the number of actual samples for that species.
As can be seen, the model the recall for 3 is the lowest and the precision for 8 and 9 are the lowest.
The overall accuracy can also be investigated by plotting the confusion matrix for the classsifications. Here we see the same conclusions as from the classification report.
%% Cell type:code id: tags:
``` python
defplt_confusion_matrix(cnf_matrix,cats,method):
"""
Plots a sklearn confusion matrix with categories 'cats' for a classifier 'method'
"""
# write the confusion matrix to a dataframe with row and column names as the categories, which are already defined
Since the models take a while to run, we don't want to repeat the training every time we are doing some model. We can therefore use `model.save()` to save all the parameters and hyperparameters for the model in HDF5 (a scientific data format).
%% Cell type:code id: tags:
``` python
model.save("KerasMnistModel.hd5")
!ls
```
%% Cell type:markdown id: tags:
Then next time we can load the model using `load_model()`. As can be seen this gives the same accuracy on our testing set.
We can also store the model after each epoch. For example, this might be useful on Google Colab if you have a very long training run which exceeds 12 hours. Then the last state of the model is saved before the run ends and the model training can be restarted from this last state. Alternatively, your computer might crash during a training run, and then you can restart the calculation at the last state.
To store the model we using `ModelCheckpoint` to create a _callback_ for Keras which tells the program what to do at the beginning and end of each epoch. We also need to reinitialize the model, so that the weights are set randomly. Otherwise, the last values of the weights and biases are used.