In this lab, we will see how to apply what we have learned about neural network.

0. Set up your workstation

As we saw during the lecture, we will use keras on top of tensorflow to train neural network. Hence we need to install it

install.packages("keras")

next we load the library and install everything required (outside of R, like a python environment, Neural nets frameworks…)

library(keras)
install_keras()

If everything went fine, you are now all set to train (deep) neural nets!

1. The basics

Fitting a neural network using keras is a 3-step procedure:

We will cover these in turn.

a) Defining the architecture

Although many architectures are available, in our (introductory) lecture, we only presented feedforward and convolutional neural nets. We will now see how to define such architectures.

## A feedforward neural net
feedforward <- keras_model_sequential() %>%
  layer_dense(input_shape = 50, units = 8, activation = "relu") %>%
  layer_dense(units = 2, activation = "relu") %>%
  layer_dense(units = 1)

## A convolutional neural net
convnet <- keras_model_sequential() %>%
  layer_conv_2d(filters = 64, kernel_size = 3, activation = "relu",
                input_shape = c(28, 28, 1)) %>%
  layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") %>%
  layer_dense(units = 10, activation = "softmax")

The code is, I think, enough verbose to understand what is going on here. Try to answer the following questions.

  • What is the architecture of thes neural nets?
  • What does it mean to set input_shape = 50 or input_shape = c(28, 28, 1)?
  • Why there is no activation in the last layer_dense call of model feedforward?
  • Why we set activation = “softmax” in the last layer_dense call of model convnet?

Have a look at the following outputs

feedforward
Model
Model: "sequential_13"
____________________________________________________________________________________________________________________
Layer (type)                                        Output Shape                                  Param #           
====================================================================================================================
dense_18 (Dense)                                    (None, 8)                                     408               
____________________________________________________________________________________________________________________
dense_19 (Dense)                                    (None, 2)                                     18                
____________________________________________________________________________________________________________________
dense_20 (Dense)                                    (None, 1)                                     3                 
====================================================================================================================
Total params: 429
Trainable params: 429
Non-trainable params: 0
____________________________________________________________________________________________________________________
convnet
Model
Model: "sequential_14"
____________________________________________________________________________________________________________________
Layer (type)                                        Output Shape                                  Param #           
====================================================================================================================
conv2d_290 (Conv2D)                                 (None, 26, 26, 64)                            640               
____________________________________________________________________________________________________________________
conv2d_291 (Conv2D)                                 (None, 24, 24, 32)                            18464             
____________________________________________________________________________________________________________________
dense_21 (Dense)                                    (None, 24, 24, 10)                            330               
====================================================================================================================
Total params: 19,434
Trainable params: 19,434
Non-trainable params: 0
____________________________________________________________________________________________________________________
  • What is printed here?
  • Try to compute “by hand” the number of parameters
## Fill in this part

b) Defining the optimization stage

The optimization stage in keras is called compiling and is very straightforward

feedforward %>% compile(optimizer = optimizer_rmsprop(),
                        loss = "mse")
  • Which optimizer are we using here?
  • Why are we using the mse loss here?
  • Which loss would you specify for the convnet model?

Possible options for the optimizer argument are:

  • optimizer_adadelta: Adaptive learning rate optimizer
  • optimizer_adagrad: Adaptive subgradient optimization
  • optimizer_adam : a kind of mix between rmsprop and momentum
  • optimizer_adamax: adam but with sup-norm
  • optimizer_nadam: adam with Nesterov acceleration
  • optimizer_rmsprop: the famaous rmsprop optimizer
  • optimizer_sgd: stochastic gradient descent

Some widely used, other are possible, options for the loss argument are:

  • ‘mse’: used mainly for regression;
  • ‘categorical_crossentropy’: used mainly for \(K\)-class classification \(K > 2\);
  • ‘binary_crossentropy’: used mainly for binary classification.

We can also add an aditional argument metrics, e.g., metrics = “accuracy” where some accuracy measure will be reported in addition to the evolution of the (generalization) error during the optimization stage.

c) Fitting the neural network

The final stage (well actually in an operational environment we will fine tune our network) consists in fitting the network. Here we supppose we have some (scaled) features X_train and a response Y_train. Training a neural net, here feedforward, is easy as 1, 2, 3:

history <- feedforward %>% fit(X_train, Y_train, epocs = 30)
plot(history)##Always a good idea to see how the optimization procedure performs

Altough the training error is of interest, it is often highly recommended to plot the test error as well. To do so, suppose we also have a X_test and Y_test data set

history <- feedforward %>% fit(X_train, Y_train, epochs = 30, verbose = 0,
                                validation_data = list(X_test, Y_test))
plot(history)

Another option would be to ask keras to split the data set

history <- feedforward %>% fit(X, Y, epochs = 30, verbose = 0,
                                validation_split = 0.2)
plot(history)

d) Making predictions

Depending on the situation, i.e., regression or classification, prediction can be done in two different ways:

feedforward %>% predict(X_new)
convnet %>% predict_classes(X_new)

1.5 Would you recognize it?

Have a look at the following neural network:

model <- keras_model_sequential() %>%
  layer_dense(input_shape = n.feat, units = 1,
              activation = "sigmoid")

model %>% compile(optimizer = optimizer_rmsprop(),
                  loss = "binary_crossentropy")

This model, rewritten within a neural network framework, is very famous. Have you recognize it?

Fit this model to a sensible data set already studied in classroom and fit it using the “conventional” way. Then compare estimates.

## Fill in this part

2. Playing with the MNIST dataset

a) Getting the data and preprocessing it

The MNIST dataset was a standard 10 years ago to benchmark various machine learning methodologies. Nowadays it is considered a bit too simple but anyway it is the perfect data set to start feeling confortable in training neural networks.

This dataset are grey-scaled images of human written digits—see an sample of this data set below.

Using the keras dedicated function we can easily retrive the data set:

mnist <- dataset_mnist()

and retrieve the training and test datasets (and scale the features to lie in \([0,1]\))

x_train <- mnist$train$x / 255
y_train <- mnist$train$y
x_test <- mnist$test$x / 255
y_test <- mnist$test$y
  1. Print the dimension of each image
## fill in this part
  1. Run the following piece of code to show a sample of these images
par(mfrow = c(2, 4), mar = rep(0, 4), bty = "n")
idx <- sample(10^4, 8)

for (i in idx)
  image(t(x_test[i,28:1,]), axes = FALSE, col = rev(grey.colors(255)))
  1. Before feeding the neural net with these data we need to reformat them to the format required by keras, i.e., the response (which is categorical) must be one-hot encoded. Learn how to use the to_categorical function to one-hot encode the response.
## Fill in this part

What is the dimension of y_test now?

## fill in this part and make sure you understand what is beyond the 'to_categorical' function
  1. Build a first (simple) feedforward neural network for the MNIST dataset. I expect you to reach at least a 99.2% accuracy on the test data set. Watch out, since it is very likely that you fine tune your model, please use the train + validation + test splitting scheme.
## Fill in this part
  1. Try to build a convolutional neural network.
## Fill in this part
  1. Add a max-pooling to your convnet model.
## Fill in this part
  1. Learn how to use the regularizer_l2 function to regularize your feedforward model.
## Fill in this part
  1. Learn how to use the layer_dropout function to regularize your convnet model.

2. Using pre-trained models

In an operational environment, it is often the case that you use pre-existing network and (possibly) slightly alter the last layers to your dataset. For instance, this can be adding another layer or estimate the weights and biases of the last layer to your dataset keeping the other ones “frozen”. Such a procedure is called using pre-trained models and parameter estimates for various famous neural nets are freely available.

a) Prediction using already fitted network

Here we just start by using the already trained neural net: the ResNet50 network

resnet <- application_resnet50()##trained on ImageNet

## Get a nice new image
url <- "cat.jpg"

img <- image_load(url, target_size = c(224, 224)) %>%## import the image in PIL format
  image_to_array() %>%## convert it to a tensor  
  array_reshape(dim = c(1, 224, 224, 3)) %>% ## store it a a 4d tensor
  imagenet_preprocess_input()## preprocessing the image from the preprocessed values of ImageNet

preds <- resnet %>% predict(img)
dim(preds)

imagenet_decode_predictions(preds)

In the code above:

  • Why do we set target_size = c(224, 224)?
  • Why the dimension of preds is 1 x 1000?

Download a picture of your choice and try to predict what it is using the pre-trained *vgg16** net.

## Fill in here

b) Slightly alter a pre-trained model

In this section we will briefly show how one can slightly alter a pre-trained model but we won’t run any code. This part is just here to show you how you can do that. Here we will fine tune the InceptionV3 neural net.

base.model <- application_inception_v3(include_top = FALSE)##trained on ImageNet

new.layers <- base.model$output %>%
  layer_dense(units = 512, activation = "relu") %>%
  layer_dense(units = 256, activation = "softmax")

altered.model <- keras_model(inputs = base.model$input,
                            outputs = new.layers$output)

## Since we want to estimate the parameters of the new layers only we have to "freeze" the original weights
freeze_weights(base_model)

## Now we compile the model
altered.model %>% compile(optimizer = optimizer_rmsprop(),
                          loss = "categorical_crossentropy")

## And finally based on new data we fit the "unfrozen" parameters
altered.model %>% fit(X_new, Y_new, epochs = 10)
---
title: "Neural Network"
output: html_notebook
---
```{r echo=FALSE}
library(keras)
```

In this lab, we will see how to apply what we have learned about neural network.

## 0. Set up your workstation 

As we saw during the lecture, we will use *keras* on top of *tensorflow* to train neural network. Hence we need to install it
```{r eval=FALSE}
install.packages("keras")
```
next we load the library and install everything required (outside of R, like a python environment, Neural nets frameworks...)
```{r eval=FALSE}
library(keras)
install_keras()
```

If everything went fine, you are now all set to train (deep) neural nets!

## 1. The basics

Fitting a neural network using *keras* is a 3-step procedure:

  - Build the network architecture;
  - Define the loss function and the optimization scheme;
  - Fit it based on same training data
  
We will cover these in turn.

### a) Defining the architecture

Although many architectures are available, in our (introductory) lecture, we only presented feedforward and convolutional neural nets. We will now see how to define such architectures.

```{r}
## A feedforward neural net
feedforward <- keras_model_sequential() %>%
  layer_dense(input_shape = 50, units = 8, activation = "relu") %>%
  layer_dense(units = 2, activation = "relu") %>%
  layer_dense(units = 1)

## A convolutional neural net
convnet <- keras_model_sequential() %>%
  layer_conv_2d(filters = 64, kernel_size = 3, activation = "relu",
                input_shape = c(28, 28, 1)) %>%
  layer_conv_2d(filters = 32, kernel_size = 3, activation = "relu") %>%
  layer_dense(units = 10, activation = "softmax")
```

The code is, I think, enough verbose to understand what is going on here. Try to answer the following questions.

  - What is the architecture of thes neural nets?
  - What does it mean to set *input_shape = 50* or *input_shape = c(28, 28, 1)*?
  - Why there is no *activation* in the last *layer_dense* call of model *feedforward*?
  - Why we set *activation = "softmax"* in the last *layer_dense* call of model *convnet*?
  
Have a look at the following outputs
```{r}
feedforward
convnet
```
  - What is printed here?
  - Try to compute "by hand" the number of parameters
```{r}
## Fill in this part
```

### b) Defining the optimization stage

The optimization stage in *keras* is called *compiling* and is very straightforward
```{r}
feedforward %>% compile(optimizer = optimizer_rmsprop(),
                        loss = "mse")
```

  - Which optimizer are we using here?
  - Why are we using the *mse* loss here?
  - Which loss would you specify for the *convnet* model?

Possible options for the *optimizer* argument are:

  - *optimizer_adadelta*: Adaptive learning rate optimizer
  - *optimizer_adagrad*: Adaptive subgradient optimization
  - *optimizer_adam* : a kind of mix between rmsprop and momentum
  - *optimizer_adamax*: adam but with sup-norm
  - *optimizer_nadam*: adam with Nesterov acceleration
  - *optimizer_rmsprop*: the famaous rmsprop optimizer
  - *optimizer_sgd*: stochastic gradient descent

Some widely used, other are possible, options for the *loss* argument are:

  - *'mse'*: used mainly for regression;
  - *'categorical_crossentropy'*: used mainly for $K$-class classification $K > 2$;
  - *'binary_crossentropy'*: used mainly for binary classification.
  
We can also add an aditional argument *metrics*, e.g., *metrics = "accuracy"* where some accuracy measure will be reported in addition to the evolution of the (generalization) error during the optimization stage.
  
### c) Fitting the neural network

The final stage (well actually in an operational environment we will fine tune our network) consists in fitting the network. Here we supppose we have some (scaled) features *X_train* and a response *Y_train*. Training a neural net, here *feedforward*, is easy as 1, 2, 3:

```{r eval = FALSE}
history <- feedforward %>% fit(X_train, Y_train, epocs = 30)
plot(history)##Always a good idea to see how the optimization procedure performs
```

Altough the **training** error is of interest, it is often highly recommended to plot the **test error** as well. To do so, suppose we also have a *X_test* and *Y_test* data set
```{r eval=FALSE}
history <- feedforward %>% fit(X_train, Y_train, epochs = 30, verbose = 0,
                                validation_data = list(X_test, Y_test))
plot(history)
```

Another option would be to ask *keras* to split the data set
```{r eval = FALSE}
history <- feedforward %>% fit(X, Y, epochs = 30, verbose = 0,
                                validation_split = 0.2)
plot(history)
```


### d) Making predictions

Depending on the situation, i.e., regression or classification, prediction can be done in two different ways:
```{r eval = FALSE}
feedforward %>% predict(X_new)
convnet %>% predict_classes(X_new)
```


## 1.5 Would you recognize it?

Have a look at the following neural network:
```{r eval=FALSE}
model <- keras_model_sequential() %>%
  layer_dense(input_shape = n.feat, units = 1,
              activation = "sigmoid")

model %>% compile(optimizer = optimizer_rmsprop(),
                  loss = "binary_crossentropy")
```

This model, rewritten within a neural network framework, is very famous. Have you recognize it?

Fit this model to a sensible data set already studied in classroom and fit it using the "conventional" way. Then compare estimates.
```{r}
## Fill in this part
```


## 2. Playing with the MNIST dataset

### a) Getting the data and preprocessing it
The *MNIST* dataset was a standard 10 years ago to benchmark various machine learning methodologies. Nowadays it is considered a bit too simple but anyway it is the perfect data set to start feeling confortable in training neural networks.

This dataset are grey-scaled images of human written digits---see an sample of this data set below.

Using the *keras* dedicated function we can easily retrive the data set:
```{r eval=FALSE}
mnist <- dataset_mnist()
```
and retrieve the *training* and *test* datasets (and scale the features to lie in $[0,1]$)
```{r eval = FALSE}
x_train <- mnist$train$x / 255
y_train <- mnist$train$y
x_test <- mnist$test$x / 255
y_test <- mnist$test$y
```

1. Print the dimension of each image
```{r}
## fill in this part
```
2. Run the following piece of code to show a sample of these images
```{r eval=FALSE}
par(mfrow = c(2, 4), mar = rep(0, 4), bty = "n")
idx <- sample(10^4, 8)

for (i in idx)
  image(t(x_test[i,28:1,]), axes = FALSE, col = rev(grey.colors(255)))
```

3. Before feeding the neural net with these data we need to reformat them to the format required by *keras*, i.e., the response (which is categorical) must be *one-hot encoded*. Learn how to use the *to_categorical* function to *one-hot encode* the response.
```{r}
## Fill in this part
```

What is the dimension of *y_test* now?
```{r}
## fill in this part and make sure you understand what is beyond the 'to_categorical' function
```

4. Build a first (simple) feedforward neural network for the MNIST dataset. I expect you to reach at least a 99.2% accuracy on the test data set. Watch out, since it is very likely that you fine tune your model, please use the *train + validation + test* splitting scheme. 

```{r}
## Fill in this part
```

5. Try to build a convolutional neural network.

```{r}
## Fill in this part
```

6. Add a max-pooling to your convnet model. 

```{r}
## Fill in this part
```

7. Learn how to use the *regularizer_l2* function to regularize your feedforward model.
```{r}
## Fill in this part
```

8. Learn how to use the *layer_dropout* function to regularize your convnet model.


## 2. Using pre-trained models

In an operational environment, it is often the case that you use pre-existing network and (possibly) slightly alter the last layers to your dataset. For instance, this can be adding another layer or estimate the weights and biases of the last layer to your dataset keeping the other ones "frozen". Such a procedure is called *using pre-trained models* and parameter estimates for various famous neural nets are freely available.

### a) Prediction using already fitted network

Here we just start by using the already trained neural net: the *ResNet50* network
```{r eval=FALSE}
resnet <- application_resnet50()##trained on ImageNet

## Get a nice new image
url <- "cat.jpg"

img <- image_load(url, target_size = c(224, 224)) %>%## import the image in PIL format
  image_to_array() %>%## convert it to a tensor  
  array_reshape(dim = c(1, 224, 224, 3)) %>% ## store it a a 4d tensor
  imagenet_preprocess_input()## preprocessing the image from the preprocessed values of ImageNet

preds <- resnet %>% predict(img)
dim(preds)

imagenet_decode_predictions(preds)
```

In the code above:

  - Why do we set *target_size = c(224, 224)*?
  - Why the dimension of *preds* is *1 x 1000*?

Download a picture of your choice and try to predict what it is using the pre-trained *vgg16** net.
```{r}
## Fill in here
```

### b) Slightly alter a pre-trained model

In this section we will briefly show how one can slightly alter a pre-trained model but we won't run any code. This part is just here to show you how you can do that. Here we will fine tune the *InceptionV3* neural net.
```{r eval=FALSE}
base.model <- application_inception_v3(include_top = FALSE)##trained on ImageNet

new.layers <- base.model$output %>%
  layer_dense(units = 512, activation = "relu") %>%
  layer_dense(units = 256, activation = "softmax")

altered.model <- keras_model(inputs = base.model$input,
                            outputs = new.layers$output)

## Since we want to estimate the parameters of the new layers only we have to "freeze" the original weights
freeze_weights(base_model)

## Now we compile the model
altered.model %>% compile(optimizer = optimizer_rmsprop(),
                          loss = "categorical_crossentropy")

## And finally based on new data we fit the "unfrozen" parameters
altered.model %>% fit(X_new, Y_new, epochs = 10)
```
