In our previous notebooks on neural networks we have considered classification problems. Regression problems, where the output is a continuous value, can also be handled using neural networks. To demonstrate this we will use the classic [Auto MPG Dataset](https://archive.ics.uci.edu/ml/datasets/auto+mpg) to build a model to predict the fuel efficiency of 1970s and early 1980s automobiles. This datasets describes fuel efficiency of many cars from that period. This description includes attributes such as: engine cylinders, engine displacement, horsepower, and weight.

We will show how linear regression can be implemented using neural networks, and then consider some simple nonlinear regression models.

%% Cell type:markdown id: tags:

## Contents

%% Cell type:markdown id: tags:

* Imports

* The Auto MPG Dataset

* Linear Regression

* Nonlinear Regression

* Model Performance

* Exercises

%% Cell type:markdown id: tags:

## Imports

%% Cell type:markdown id: tags:

We import the standard libraries. If you are running this on Google Colab, and `seaborn` cannot be found, then uncomment the following cell.

%% Cell type:code id: tags:

```

# !pip install -q seaborn

```

%% Cell type:code id: tags:

```

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import seaborn as sns

```

%% Cell type:markdown id: tags:

We import `tensorflow` and `keras`, the package `layers` for setting up sequential models and a function for normalizing data.

%% Cell type:code id: tags:

```

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers

from tensorflow.keras.layers.experimental import preprocessing

```

%% Cell type:markdown id: tags:

## The Auto MPG dataset

%% Cell type:markdown id: tags:

We first download and import the dataset from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/) using pandas. The names can be found in the file [auto-mpg.names](https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.names).

First we check for missing values, which are denoted by `?` in the file, and which the import statement has converted to NaN.

%% Cell type:code id: tags:

```

df.isna().sum()

```

%% Cell type:markdown id: tags:

Drop those rows to keep this simple.

%% Cell type:code id: tags:

```

df.dropna(inplace=True)

```

%% Cell type:markdown id: tags:

The `"Origin"` column is categorical, not numeric. So we first use a dictionary to convert it to categorical data and then one-hot encode with `pd.get_dummies`.

We can view the statistics, to see the range of each variable.

%% Cell type:code id: tags:

```

pd.set_option("precision", 2)

df.describe()

```

%% Cell type:markdown id: tags:

We can investigate the correlation between variables. The fuel efficiency (MPG) is highly correlated with the variables `Cylinders`, `Displacement`, `Horsepower` and `Weight`, and the correlation between these variables is very high.

%% Cell type:code id: tags:

```

corrs = df.corr() # calculate the correlation table

# as this is a symmetric table, set up a mask so that we only plot values below the main diagonal

mask = np.triu(np.ones_like(corrs, dtype=bool))

f, ax = plt.subplots(figsize=(10, 8)) # initialise the plots and axes

# plot the correlations as a seaborn heatmap, with a colourbar

# do some fiddling so that the top and bottom are not obscured

bottom, top = ax.get_ylim()

ax.set_ylim(bottom + 0.5, top - 0.5);

```

%% Cell type:markdown id: tags:

We can use `sns.pairplot()` to view the relationship between different features. What we now want to do is create a model which determines the fuel efficiency (MPG) as a function of the other features. It appears that fuel efficiency is approximately inversely proportional to the other variables.

Since we want to predict `MPG`, this is our label, and the other variables are features. We can separate the data into the label and features, and then split both sets into testing and training sets.

%% Cell type:code id: tags:

```

from sklearn.model_selection import train_test_split

In the table of statistics it is apparent that the values of the features are widely distributed.

%% Cell type:code id: tags:

```

pd.set_option("precision", 2)

X_train.describe().loc[['mean', 'std']]

```

%% Cell type:markdown id: tags:

As with classification with neural networks, it is best practice to normalize features that use different scales and ranges. This ensures that techniques such as regularization, which we will consider later, can be applied uniformly.

There is no advantage to normalizing the one-hot features, it is done here for simplicity.

%% Cell type:markdown id: tags:

Previously we have used the `mean` and `std` to normalize the variables. Here we will use `keras``preprocessing.Normalization` layer to build the preprocessing into the model.

The first step is to create the layer. `axis=-1` states to apply the normalization to the last dimension.

%% Cell type:code id: tags:

```

normalizer = preprocessing.Normalization(axis=-1)

```

%% Cell type:markdown id: tags:

Then `.adapt()` it to the data, which calculates the mean and variance, and stores them in the layer.

%% Cell type:code id: tags:

```

normalizer.adapt(np.array(X_train))

```

%% Cell type:code id: tags:

```

print(normalizer.mean.numpy())

```

%% Cell type:markdown id: tags:

When the layer is called it returns the input data, with each feature independently normalized.

Before building a nonlinear neural network model, we will build a model which implements linear regression. This corresponds to a single perceptron with a continuous output.

%% Cell type:markdown id: tags:

First we create a convenience function to build and compile a simple sequential neural network. This is essentially the same as the models we have previously considered for classification, except the last layer is a `Dense` layer, with a single continuous output, and the loss function is the mean absolute error.

For this models we can specify different inputs using `norm`, the number of hidden layers and the neurons in each of these layers, and the learning rate used by SGD.

We start by creating a linear regression model which takes the input `Horsepower` and aims to predict `MPG`.

For our default model the first thing we need to do is create the horsepower `Normalization` layer. This just corresponds to creating an array with the `Horsepower` from the training set, initializing the normalization routine so that the input shape is a single vector (one feature) and then adapting the normalizer to calculate the mean and standard deviation of `Horsepower`.

This layer then can be used as the input for our model. Since we are doing linear regression, we want no hidden layers and can set the number of neurons to 0. We can then output a summary of the model.

Now the model is configured, we use `Model.fit()` to train the model. Here we use 80% of the data for training and 20% for validation. The evolution of the metrics for the model are stored in `history`.

%% Cell type:code id: tags:

```

%%time

history = horsepower_model.fit(

X_train['Horsepower'], y_train,

epochs=200,

# suppress logging

verbose=0,

# Calculate validation results on 20% of the training data

validation_split = 0.2)

```

%% Cell type:markdown id: tags:

We will create a simple function for plotting the history of the model.

For this model the training loss and the validation loss decreases steadily, with the training loss always being less than the validation loss, which is to be expected.

%% Cell type:code id: tags:

```

plot_loss(history)

```

%% Cell type:markdown id: tags:

We evaluate the results and store them in a structure for comparison with the other models.

Since this is a single variable linear regression the output corresponds to a linear relationship, and we can compare the model predictions against the actual value.

%% Cell type:code id: tags:

```

x = tf.linspace(0.0, 250, 251)

y = horsepower_model.predict(x)

```

%% Cell type:markdown id: tags:

We define another convenience function for comparing the predictions.

The predictions are reasonable for mid-range horsepower, but fail at the upper and lower limits.

%% Cell type:code id: tags:

```

plot_horsepower(x,y)

```

%% Cell type:markdown id: tags:

To implement multi-dimensional linear regression, we now just need to input the normalization layer which was defined earlier for the whole data set. Now the input shape corresponds 9 features.

The previous section implemented linear models for single and multiple inputs.

This section implements single-input and multiple-input Neural Network models. The code is essentially the same except the model is expanded to include hidden nonlinear layers.

These models will contain a few more layers than the linear model:

* The normalization layer.

* Two hidden, nonlinear, `Dense` layers using the `relu` nonlinearity.

* A linear single-output layer.

%% Cell type:markdown id: tags:

We start with a model for the single input "Horsepower". Note that the only difference is the number of hidden layers, and the number of neurons in these layers. However, there are now significantly more trainable parameters.

Now that all the models are trained we can compare the performance. Not surprisingly, as the complexity of the model increases the absolute error of the model decrease. This suggests that the final model does not have excessive overfitting.

This suggests the model predicts the fuel efficiency reasonably well, so we can save it for later use.

%% Cell type:code id: tags:

```

nn_model.save('nn_model.hd5');

```

%% Cell type:markdown id: tags:

## Exercises

%% Cell type:markdown id: tags:

For these exercises we will investigate regularization techniques to cope with overfitting for the full nonlinear model. There are two-standard techniques for dealing with overfitting.

The first of these is to use L2 (Ridge) or L1 (Lasso) Regularization on each layer. These add a penalty term to the objective function which proportional to the square or absolute value of the weights. The objective is to reduce the number of non-zero weights in the system. This is analogous to what was previously considered with linear and logistic regression.

The second method is to use dropout layers. In a dropout layer, during training a randomly chosen percentage of the nodes in each layer are ignored at each iteration. This reduces the sensitivity of the network to the training set and hence creates a more robust model.

The function below generalizes the one created earlier to include L2 Regularization and Dropout layers.