In this lab we will learn how to apply what we have learned during the lecture about linear regression.

1. Just to be sure to understand the maths

In this section we will check if the maths related to fitting a linear model are properly understood. To this aim, we will fit a linear model on simulated data (so that the true model is perfectly known) from scratch and using the dedicated builtin R function.

a) Generate some artificial data

Let’s first simulate some data

set.seed(1)##fix the random seed so that we all have the same results
n.obs <- 100

## Build some features
X1 <- cos(seq(0, 2 * pi, length = n.obs))
X2 <- 1:n.obs

## A design matrix
X <- cbind(1, X1, X2)

## The true parameter beta
beta <- c(2, -2, 0.5)

## The true response
Ytheo <- X %*% beta

## The observed response (true + noise)
Y <- Ytheo + rnorm(n.obs, 0, 0.25)

b) Analysis “from scratch”

  1. Compute the least square estimate \(\hat{\beta}\) (slide 25).
## Fill in this part
  1. Compute the fitted values \(\hat{Y} = X \hat{\beta}\)
## Fill in this part
  1. Compare the fitted values \(\hat Y\) to the observed ones \(Y\).
## Fill in this part

c) Analysis using the dedicated R function

  1. Read the help file of the lm function.
  2. Build a data frame that contains all the features of our linear model.
## Fill in this part
  1. Fit the corresponding linear model using the lm function.
## Fill in this part
  1. Run the following code and comment
plot(fit)##fit is the output of the previous lm call, I'm here to explain each plot ;-)
  1. Use the predict function to make prediction
## Fill in this part

2. Redoing the Galapagos analysis

We move a bit further by analyzing the Galapagos data set of the lecture. The dataset is available from the R package faraway. Try to redo the analysis we did in Slide 26–41.

3. Modelling the monthly total of international airline passengers

The AirPassengers dataset, already included into R, stores the monthly total of international airline passengers from 1949 to 1960.

  1. Plot the data and comment what you see.
  2. Let \(Y_t\) be the number of passenger at time \(t\). Fit the following linear model: \[\begin{equation*} Y_t = \beta_0 + \beta_1 (t - 1955) / 100 + \varepsilon_t, \qquad \varepsilon_t \sim N(0, \sigma^2), \end{equation*}\] analyze your prediction and give comments about the limitations of your fitted model.

  3. Now do exactly the same but on a log-scale, i.e., \[\begin{equation*} \log Y_t = \beta_0 + \beta_1 (t - 1955) / 100 + \varepsilon_t, \qquad \varepsilon_t \sim N(0, \sigma^2). \end{equation*}\] Analyze your prediction and give comments about the limitations of your fitted model.

  4. Although we must rather use \(SARIMA\) modelling (beyond the scope of this lecture) to this aim, to model the observed seasonality we can use the following (autoregressive) linear model: \[\begin{equation*} Y_t = \beta_0 + \beta_1 Y_{t-1} + \ldots + \beta_{13} Y_{t-13} + \varepsilon_t, \qquad \varepsilon_t \sim N(0, \sigma^2). \end{equation*}\]

  1. Learn how to use the embed function to store all the features into a single data frame.
  2. Fit the corresponding linear model
  3. Do model selection (see the lecture note for this)
  4. Give a plot showing how your model perform and do prediction for years 1961 and 1962.
LS0tCnRpdGxlOiAiTGluZWFyIFJlZ3Jlc3Npb24gd2l0aCBSIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpJbiB0aGlzIGxhYiB3ZSB3aWxsIGxlYXJuIGhvdyB0byBhcHBseSB3aGF0IHdlIGhhdmUgbGVhcm5lZCBkdXJpbmcgdGhlIGxlY3R1cmUgYWJvdXQgbGluZWFyIHJlZ3Jlc3Npb24uCgojIyAxLiBKdXN0IHRvIGJlIHN1cmUgdG8gdW5kZXJzdGFuZCB0aGUgbWF0aHMKCkluIHRoaXMgc2VjdGlvbiB3ZSB3aWxsIGNoZWNrIGlmIHRoZSBtYXRocyByZWxhdGVkIHRvIGZpdHRpbmcgYSBsaW5lYXIgbW9kZWwgYXJlIHByb3Blcmx5IHVuZGVyc3Rvb2QuIFRvIHRoaXMgYWltLCB3ZSB3aWxsIGZpdCBhIGxpbmVhciBtb2RlbCBvbiBzaW11bGF0ZWQgZGF0YSAoc28gdGhhdCB0aGUgdHJ1ZSBtb2RlbCBpcyBwZXJmZWN0bHkga25vd24pIGZyb20gc2NyYXRjaCBhbmQgdXNpbmcgdGhlIGRlZGljYXRlZCBidWlsdGluICpSKiBmdW5jdGlvbi4KCiMjIyBhKSBHZW5lcmF0ZSBzb21lIGFydGlmaWNpYWwgZGF0YQpMZXQncyBmaXJzdCBzaW11bGF0ZSBzb21lIGRhdGEKYGBge3J9CnNldC5zZWVkKDEpIyNmaXggdGhlIHJhbmRvbSBzZWVkIHNvIHRoYXQgd2UgYWxsIGhhdmUgdGhlIHNhbWUgcmVzdWx0cwpuLm9icyA8LSAxMDAKCiMjIEJ1aWxkIHNvbWUgZmVhdHVyZXMKWDEgPC0gY29zKHNlcSgwLCAyICogcGksIGxlbmd0aCA9IG4ub2JzKSkKWDIgPC0gMTpuLm9icwoKIyMgQSBkZXNpZ24gbWF0cml4ClggPC0gY2JpbmQoMSwgWDEsIFgyKQoKIyMgVGhlIHRydWUgcGFyYW1ldGVyIGJldGEKYmV0YSA8LSBjKDIsIC0yLCAwLjUpCgojIyBUaGUgdHJ1ZSByZXNwb25zZQpZdGhlbyA8LSBYICUqJSBiZXRhCgojIyBUaGUgb2JzZXJ2ZWQgcmVzcG9uc2UgKHRydWUgKyBub2lzZSkKWSA8LSBZdGhlbyArIHJub3JtKG4ub2JzLCAwLCAwLjI1KQpgYGAKCiMjIyBiKSBBbmFseXNpcyAiZnJvbSBzY3JhdGNoIgoKMS4gQ29tcHV0ZSB0aGUgbGVhc3Qgc3F1YXJlIGVzdGltYXRlICRcaGF0e1xiZXRhfSQgKHNsaWRlIDI1KS4KYGBge3J9CiMjIEZpbGwgaW4gdGhpcyBwYXJ0CmBgYAoKMi4gQ29tcHV0ZSB0aGUgZml0dGVkIHZhbHVlcyAkXGhhdHtZfSA9IFggXGhhdHtcYmV0YX0kCmBgYHtyfQojIyBGaWxsIGluIHRoaXMgcGFydApgYGAKCjMuIENvbXBhcmUgdGhlIGZpdHRlZCB2YWx1ZXMgJFxoYXQgWSQgdG8gdGhlIG9ic2VydmVkIG9uZXMgJFkkLgpgYGB7cn0KIyMgRmlsbCBpbiB0aGlzIHBhcnQKYGBgCgojIyMgYykgQW5hbHlzaXMgdXNpbmcgdGhlIGRlZGljYXRlZCAqUiogZnVuY3Rpb24KCjEuIFJlYWQgdGhlIGhlbHAgZmlsZSBvZiB0aGUgKmxtKiBmdW5jdGlvbi4KMi4gQnVpbGQgYSAqZGF0YSBmcmFtZSogdGhhdCBjb250YWlucyBhbGwgdGhlIGZlYXR1cmVzIG9mIG91ciBsaW5lYXIgbW9kZWwuCmBgYHtyfQojIyBGaWxsIGluIHRoaXMgcGFydApgYGAKCjMuIEZpdCB0aGUgY29ycmVzcG9uZGluZyBsaW5lYXIgbW9kZWwgdXNpbmcgdGhlICpsbSogZnVuY3Rpb24uCmBgYHtyfQojIyBGaWxsIGluIHRoaXMgcGFydApgYGAKCjQuIFJ1biB0aGUgZm9sbG93aW5nIGNvZGUgYW5kIGNvbW1lbnQKYGBge3IgZXZhbD1GQUxTRX0KcGxvdChmaXQpIyNmaXQgaXMgdGhlIG91dHB1dCBvZiB0aGUgcHJldmlvdXMgbG0gY2FsbCwgSSdtIGhlcmUgdG8gZXhwbGFpbiBlYWNoIHBsb3QgOy0pCmBgYAoKNS4gVXNlIHRoZSAqcHJlZGljdCogZnVuY3Rpb24gdG8gbWFrZSBwcmVkaWN0aW9uCmBgYHtyfQojIyBGaWxsIGluIHRoaXMgcGFydApgYGAKCiMjIDIuIFJlZG9pbmcgdGhlIEdhbGFwYWdvcyBhbmFseXNpcwoKV2UgbW92ZSBhIGJpdCBmdXJ0aGVyIGJ5IGFuYWx5emluZyB0aGUgR2FsYXBhZ29zIGRhdGEgc2V0IG9mIHRoZSBsZWN0dXJlLiBUaGUgZGF0YXNldCBpcyBhdmFpbGFibGUgZnJvbSB0aGUgKlIqIHBhY2thZ2UgKmZhcmF3YXkqLiBUcnkgdG8gcmVkbyB0aGUgYW5hbHlzaXMgd2UgZGlkIGluIFNsaWRlIDI2LS00MS4KCiMjIDMuIE1vZGVsbGluZyB0aGUgbW9udGhseSB0b3RhbCBvZiBpbnRlcm5hdGlvbmFsIGFpcmxpbmUgcGFzc2VuZ2VycwoKVGhlICpBaXJQYXNzZW5nZXJzKiBkYXRhc2V0LCBhbHJlYWR5IGluY2x1ZGVkIGludG8gKlIqLCBzdG9yZXMgdGhlIG1vbnRobHkgdG90YWwgb2YgaW50ZXJuYXRpb25hbCBhaXJsaW5lIHBhc3NlbmdlcnMgZnJvbSAxOTQ5IHRvIDE5NjAuCgoxLiBQbG90IHRoZSBkYXRhIGFuZCBjb21tZW50IHdoYXQgeW91IHNlZS4KMi4gTGV0ICRZX3QkIGJlIHRoZSBudW1iZXIgb2YgcGFzc2VuZ2VyIGF0IHRpbWUgJHQkLiBGaXQgdGhlIGZvbGxvd2luZyBsaW5lYXIgbW9kZWw6ClxiZWdpbntlcXVhdGlvbip9CiAgWV90ID0gXGJldGFfMCArIFxiZXRhXzEgKHQgLSAxOTU1KSAvIDEwMCArIFx2YXJlcHNpbG9uX3QsIFxxcXVhZCBcdmFyZXBzaWxvbl90IFxzaW0gTigwLCBcc2lnbWFeMiksClxlbmR7ZXF1YXRpb24qfQphbmFseXplIHlvdXIgcHJlZGljdGlvbiBhbmQgZ2l2ZSBjb21tZW50cyBhYm91dCB0aGUgbGltaXRhdGlvbnMgb2YgeW91ciBmaXR0ZWQgbW9kZWwuCgozLiBOb3cgZG8gZXhhY3RseSB0aGUgc2FtZSBidXQgb24gYSBsb2ctc2NhbGUsIGkuZS4sClxiZWdpbntlcXVhdGlvbip9CiAgXGxvZyBZX3QgPSBcYmV0YV8wICsgXGJldGFfMSAodCAtIDE5NTUpIC8gMTAwICsgXHZhcmVwc2lsb25fdCwgXHFxdWFkIFx2YXJlcHNpbG9uX3QgXHNpbSBOKDAsIFxzaWdtYV4yKS4KXGVuZHtlcXVhdGlvbip9CkFuYWx5emUgeW91ciBwcmVkaWN0aW9uIGFuZCBnaXZlIGNvbW1lbnRzIGFib3V0IHRoZSBsaW1pdGF0aW9ucyBvZiB5b3VyIGZpdHRlZCBtb2RlbC4KCjQuIEFsdGhvdWdoIHdlIG11c3QgcmF0aGVyIHVzZSAkU0FSSU1BJCBtb2RlbGxpbmcgKGJleW9uZCB0aGUgc2NvcGUgb2YgdGhpcyBsZWN0dXJlKSB0byB0aGlzIGFpbSwgdG8gbW9kZWwgdGhlIG9ic2VydmVkIHNlYXNvbmFsaXR5IHdlIGNhbiB1c2UgdGhlIGZvbGxvd2luZyAoYXV0b3JlZ3Jlc3NpdmUpIGxpbmVhciBtb2RlbDoKXGJlZ2lue2VxdWF0aW9uKn0KWV90ID0gXGJldGFfMCArIFxiZXRhXzEgWV97dC0xfSArIFxsZG90cyArIFxiZXRhX3sxM30gWV97dC0xM30gKyBcdmFyZXBzaWxvbl90LCBccXF1YWQgXHZhcmVwc2lsb25fdCBcc2ltIE4oMCwgXHNpZ21hXjIpLgpcZW5ke2VxdWF0aW9uKn0KCiAgYSkgTGVhcm4gaG93IHRvIHVzZSB0aGUgKmVtYmVkKiBmdW5jdGlvbiB0byBzdG9yZSBhbGwgdGhlIGZlYXR1cmVzIGludG8gYSBzaW5nbGUgZGF0YSBmcmFtZS4KICBiKSBGaXQgdGhlIGNvcnJlc3BvbmRpbmcgbGluZWFyIG1vZGVsCiAgYykgRG8gbW9kZWwgc2VsZWN0aW9uIChzZWUgdGhlIGxlY3R1cmUgbm90ZSBmb3IgdGhpcykKICBkKSBHaXZlIGEgcGxvdCBzaG93aW5nIGhvdyB5b3VyIG1vZGVsIHBlcmZvcm0gYW5kIGRvIHByZWRpY3Rpb24gZm9yIHllYXJzIDE5NjEgYW5kIDE5NjIuCgoKCgo=