library(mgcv)##load the libraryLoading required package: nlme
This is mgcv 1.9-3. For overview type 'help("mgcv-package")'.
data(trees)
fit <- gam(Volume ~ s(Girth), family = Gamma(link = "log"), data = trees)mgcvAlthough, other approaches are possible, the mgcv package (already installed with R) is probably the best option ever! In this exercise we will learn the basics of this library. Here is a typical use:
library(mgcv)##load the libraryLoading required package: nlme
This is mgcv 1.9-3. For overview type 'help("mgcv-package")'.
data(trees)
fit <- gam(Volume ~ s(Girth), family = Gamma(link = "log"), data = trees)Give the mathematical expression for the above model. Do we use a canonical link?
Run the following piece of code and comment. (You may want to have a look at the help page of plot.gam).
plot(fit, resid = TRUE)
Read the documentation of the gam.check function and run it on the above fitted model and comment.
## Insert code hereWrite comment here.
Read the documentation of the s function. What type of spline were we using? Refit the above model using cubic regression splines and compare the two models.
## Insert code hereFit another model which makes use of the two available covariates, i.e., Height and Girth.
## Insert code hereWhat about a model with a bivariate smoother?
## Insert code hereAmong all the models considered so far, which one is preferable to use?
## Insert code hereWrite comments here.
With this exercise we will work on a data set available from data.gouv.fr which collects the number of bike rides per day in Nantes (using Naolib). In addition to the total number of rides, we also have access to the number of rides for short-term and long-term subscriptions.
url <- "https://mribatet.perso.math.cnrs.fr/CentraleNantes/Data/trajets-biclooplus-nantes-metropole.csv"
data <- read.csv2(url)The column Date is currently stored as a character string, learn how to use the as.Date function to convert it as a Date.
## Write code hereItβs time to do some feature engineering. Clearly, the day of the week, the month are sensible βnewβ covariates. Create them (and any other fantastic idea you may have).
## Write code hereUsing the day of the week and month variable, fit a (sensible) GLM to predict the total number of rides.
## Write code hereInterpret results.
## Write codeComment here.
Extend your model using a GAM whose linear predictor is \[\eta(t) = \beta_0 + \sum_{j \in \{\text{tuesday}, \ldots, \text{sunday}\}} \beta_{1,j} 1_{\{\mbox{day of week}(t) = j\}}+ \sum_{j \in \{\text{Feb}, \ldots, \text{Dec}\}} 1_{\{\mbox{month}(t) = j\}} + f(\mbox{day of year}(t))\]
By the way, why in the above equation monday and Januaray do not appear? Hint: The lubridate package may be helpful.
## Write code hereGive comment here.
Perform a model checking stage and comment.
## Write code hereGive comment here.
Plot the fitted spline. Does it make sense?
## Write code hereGive comment here.
Read the documentation of the s function and learn how to use a cyclic spline. Next improve your model and plot the fitted spline.
## Write code hereHow would you do to assess if people are increasingly using the bike sharing system over time?
## Write code hereGive comments here.
If you have time, think about comparing the behaviour for short term and long term subscription.
## Write code here