In this lab, we will see how to apply what we have learned about logistic regression.

1. Just to be sure to understand the math

In this section we will check if the maths related to logistic regression are properly understood. To this aim, we will fit a logistic regression model on simulated data (so that the true model is perfectly known) from scratch and using the dedicated builtin R function.

a) Generate some artificial data

Let’s first simulate some data

set.seed(1)##fix the random seed so that we all have the same results
n.obs <- 100

## Build some features
X1 <- cos(seq(0, 2 * pi, length = n.obs))
X2 <- 1:n.obs / n.obs

## A design matrix
X <- cbind(1, X1, X2)

## The true parameter beta
beta <- c(2, -2, 0.5)

## The true response
odds <- exp(X %*% beta)
ptheo <- odds / (1 + odds)

## The observed response (0, 1)
Y <- rbinom(n.obs, 1, ptheo)
table(Y)
Y
 0  1 
15 85 

b) Analysis “from scratch”

  1. Try to understand every lines of the code below (which computes the negative log-likelihood for a logistic regression model)
nllik <- function(par, features, response){## nllik stands for Negative Log Likelihood
  ## par is the vector of parameters, i.e., beta_0, beta_1 and beta_3 here
  ## features is the data.frame storing features
  ## response is the observed response (binary response here)
  
  dsgn.mat <- cbind(1, features)
  log.odds <- dsgn.mat %*% par
  odds <- exp(log.odds)
  proba <- odds / (1 + odds)
  
  return(-sum(log1p(-proba) + response * log.odds))
}

## A typical call to the nllik function
nllik(c(1, 0, 0), features = cbind(X1, X2), response = Y)
[1] 46.32617
  1. Learn how to use the nlm function to retrieve the maximum likelihood estimate
## Fill in this part
  1. Tweak your previous line of code to get standard errors as well
## Fill in this part
  1. Have a look at the function glm and use it to fit our logistic model.
## Fill in this part

c) Analysis using the dedicated R function

  1. Read the help file of the glm function.
  2. Fit the corresponding logistic regression model using the glm function.
## Fill in this part
  1. Run the following code and comment
predict(fit)##fit is the output of the previous glm call, I'm here to explain each plot ;-)
predict(fit, type = "response")
  1. Give the odds for each feature
## Fill in this part

2. Redoing the Heart Disease in South Africa analysis

We move a bit further by analyzing the Heart Disease in South Africa analysis of the lecture. The dataset is available from the here. Try to redo the analysis we did during the lecture.

3. Analysing the Default data set

In this section we will analyze the Default dataset (available from here). The main goal of this study is to predict whether an individual will default on his or her credit card payment, on the basis of annual income, monthly credit card balance and if the individual is a student or not.

  1. Familiarize yourself with the data and produce some (useful!) exploratory plots and summary statistics
  2. Fit a first logistic regression model and analyze your fitted model.
  3. According to your fitted model, is a student less or more likely to default (for a fixed value of balance and income)? Is this in agreement with your investigations on question 1?
  4. What is the probability of default for a student having a credit card balance of 650$ and an income of 17000$?
LS0tCnRpdGxlOiAiTG9naXN0aWMgcmVncmVzc2lvbiIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKSW4gdGhpcyBsYWIsIHdlIHdpbGwgc2VlIGhvdyB0byBhcHBseSB3aGF0IHdlIGhhdmUgbGVhcm5lZCBhYm91dCBsb2dpc3RpYyByZWdyZXNzaW9uLgoKIyMgMS4gSnVzdCB0byBiZSBzdXJlIHRvIHVuZGVyc3RhbmQgdGhlIG1hdGgKCkluIHRoaXMgc2VjdGlvbiB3ZSB3aWxsIGNoZWNrIGlmIHRoZSBtYXRocyByZWxhdGVkIHRvIGxvZ2lzdGljIHJlZ3Jlc3Npb24gYXJlIHByb3Blcmx5IHVuZGVyc3Rvb2QuIFRvIHRoaXMgYWltLCB3ZSB3aWxsIGZpdCBhIGxvZ2lzdGljIHJlZ3Jlc3Npb24gbW9kZWwgb24gc2ltdWxhdGVkIGRhdGEgKHNvIHRoYXQgdGhlIHRydWUgbW9kZWwgaXMgcGVyZmVjdGx5IGtub3duKSBmcm9tIHNjcmF0Y2ggYW5kIHVzaW5nIHRoZSBkZWRpY2F0ZWQgYnVpbHRpbiAqUiogZnVuY3Rpb24uCgojIyMgYSkgR2VuZXJhdGUgc29tZSBhcnRpZmljaWFsIGRhdGEKTGV0J3MgZmlyc3Qgc2ltdWxhdGUgc29tZSBkYXRhCmBgYHtyfQpzZXQuc2VlZCgxKSMjZml4IHRoZSByYW5kb20gc2VlZCBzbyB0aGF0IHdlIGFsbCBoYXZlIHRoZSBzYW1lIHJlc3VsdHMKbi5vYnMgPC0gMTAwCgojIyBCdWlsZCBzb21lIGZlYXR1cmVzClgxIDwtIGNvcyhzZXEoMCwgMiAqIHBpLCBsZW5ndGggPSBuLm9icykpClgyIDwtIDE6bi5vYnMgLyBuLm9icwoKIyMgQSBkZXNpZ24gbWF0cml4ClggPC0gY2JpbmQoMSwgWDEsIFgyKQoKIyMgVGhlIHRydWUgcGFyYW1ldGVyIGJldGEKYmV0YSA8LSBjKDIsIC0yLCAwLjUpCgojIyBUaGUgdHJ1ZSByZXNwb25zZQpvZGRzIDwtIGV4cChYICUqJSBiZXRhKQpwdGhlbyA8LSBvZGRzIC8gKDEgKyBvZGRzKQoKIyMgVGhlIG9ic2VydmVkIHJlc3BvbnNlICgwLCAxKQpZIDwtIHJiaW5vbShuLm9icywgMSwgcHRoZW8pCnRhYmxlKFkpCmBgYAoKIyMjIGIpIEFuYWx5c2lzICJmcm9tIHNjcmF0Y2giCgoxLiBUcnkgdG8gdW5kZXJzdGFuZCBldmVyeSBsaW5lcyBvZiB0aGUgY29kZSBiZWxvdyAod2hpY2ggY29tcHV0ZXMgdGhlIG5lZ2F0aXZlIGxvZy1saWtlbGlob29kIGZvciBhIGxvZ2lzdGljIHJlZ3Jlc3Npb24gbW9kZWwpCmBgYHtyfQpubGxpayA8LSBmdW5jdGlvbihwYXIsIGZlYXR1cmVzLCByZXNwb25zZSl7IyMgbmxsaWsgc3RhbmRzIGZvciBOZWdhdGl2ZSBMb2cgTGlrZWxpaG9vZAogICMjIHBhciBpcyB0aGUgdmVjdG9yIG9mIHBhcmFtZXRlcnMsIGkuZS4sIGJldGFfMCwgYmV0YV8xIGFuZCBiZXRhXzMgaGVyZQogICMjIGZlYXR1cmVzIGlzIHRoZSBkYXRhLmZyYW1lIHN0b3JpbmcgZmVhdHVyZXMKICAjIyByZXNwb25zZSBpcyB0aGUgb2JzZXJ2ZWQgcmVzcG9uc2UgKGJpbmFyeSByZXNwb25zZSBoZXJlKQogIAogIGRzZ24ubWF0IDwtIGNiaW5kKDEsIGZlYXR1cmVzKQogIGxvZy5vZGRzIDwtIGRzZ24ubWF0ICUqJSBwYXIKICBvZGRzIDwtIGV4cChsb2cub2RkcykKICBwcm9iYSA8LSBvZGRzIC8gKDEgKyBvZGRzKQogIAogIHJldHVybigtc3VtKGxvZzFwKC1wcm9iYSkgKyByZXNwb25zZSAqIGxvZy5vZGRzKSkKfQoKIyMgQSB0eXBpY2FsIGNhbGwgdG8gdGhlIG5sbGlrIGZ1bmN0aW9uCm5sbGlrKGMoMSwgMCwgMCksIGZlYXR1cmVzID0gY2JpbmQoWDEsIFgyKSwgcmVzcG9uc2UgPSBZKQpgYGAKCjIuIExlYXJuIGhvdyB0byB1c2UgdGhlICpubG0qIGZ1bmN0aW9uIHRvIHJldHJpZXZlIHRoZSBtYXhpbXVtIGxpa2VsaWhvb2QgZXN0aW1hdGUKYGBge3J9CiMjIEZpbGwgaW4gdGhpcyBwYXJ0CmBgYAoKMy4gVHdlYWsgeW91ciBwcmV2aW91cyBsaW5lIG9mIGNvZGUgdG8gZ2V0IHN0YW5kYXJkIGVycm9ycyBhcyB3ZWxsCmBgYHtyfQojIyBGaWxsIGluIHRoaXMgcGFydApgYGAKCjQuIEhhdmUgYSBsb29rIGF0IHRoZSBmdW5jdGlvbiAqZ2xtKiBhbmQgdXNlIGl0IHRvIGZpdCBvdXIgbG9naXN0aWMgbW9kZWwuCmBgYHtyfQojIyBGaWxsIGluIHRoaXMgcGFydApgYGAKCiMjIyBjKSBBbmFseXNpcyB1c2luZyB0aGUgZGVkaWNhdGVkICpSKiBmdW5jdGlvbgoKMS4gUmVhZCB0aGUgaGVscCBmaWxlIG9mIHRoZSAqZ2xtKiBmdW5jdGlvbi4KMi4gRml0IHRoZSBjb3JyZXNwb25kaW5nIGxvZ2lzdGljIHJlZ3Jlc3Npb24gbW9kZWwgdXNpbmcgdGhlICpnbG0qIGZ1bmN0aW9uLgpgYGB7cn0KIyMgRmlsbCBpbiB0aGlzIHBhcnQKYGBgCgo0LiBSdW4gdGhlIGZvbGxvd2luZyBjb2RlIGFuZCBjb21tZW50CmBgYHtyIGV2YWw9RkFMU0V9CnByZWRpY3QoZml0KSMjZml0IGlzIHRoZSBvdXRwdXQgb2YgdGhlIHByZXZpb3VzIGdsbSBjYWxsLCBJJ20gaGVyZSB0byBleHBsYWluIGVhY2ggcGxvdCA7LSkKcHJlZGljdChmaXQsIHR5cGUgPSAicmVzcG9uc2UiKQpgYGAKCjUuIEdpdmUgdGhlIG9kZHMgZm9yIGVhY2ggZmVhdHVyZQpgYGB7cn0KIyMgRmlsbCBpbiB0aGlzIHBhcnQKYGBgCgojIyAyLiBSZWRvaW5nIHRoZSBIZWFydCBEaXNlYXNlIGluIFNvdXRoIEFmcmljYSBhbmFseXNpcwoKV2UgbW92ZSBhIGJpdCBmdXJ0aGVyIGJ5IGFuYWx5emluZyB0aGUgSGVhcnQgRGlzZWFzZSBpbiBTb3V0aCBBZnJpY2EgYW5hbHlzaXMgb2YgdGhlIGxlY3R1cmUuIFRoZSBkYXRhc2V0IGlzIGF2YWlsYWJsZSBmcm9tIHRoZSBbaGVyZV0oaHR0cDovL21yaWJhdGV0LnBlcnNvLm1hdGguY25ycy5mci9DZW50cmFsZU5hbnRlcy9GYXN0VHJhY2svU291dGhBZnJpY2FuSGVhcnQudHh0KS4gVHJ5IHRvIHJlZG8gdGhlIGFuYWx5c2lzIHdlIGRpZCBkdXJpbmcgdGhlIGxlY3R1cmUuCgojIyAzLiBBbmFseXNpbmcgdGhlICpEZWZhdWx0KiBkYXRhIHNldAoKSW4gdGhpcyBzZWN0aW9uIHdlIHdpbGwgYW5hbHl6ZSB0aGUgKkRlZmF1bHQqIGRhdGFzZXQgKGF2YWlsYWJsZSBmcm9tIFtoZXJlXShodHRwOi8vbXJpYmF0ZXQucGVyc28ubWF0aC5jbnJzLmZyL0NlbnRyYWxlTmFudGVzL0Zhc3RUcmFjay9EZWZhdWx0cy50eHQpKS4gVGhlIG1haW4gZ29hbCBvZiB0aGlzIHN0dWR5IGlzIHRvIHByZWRpY3Qgd2hldGhlciBhbiBpbmRpdmlkdWFsIHdpbGwgZGVmYXVsdCBvbiBoaXMgb3IgaGVyIGNyZWRpdCBjYXJkIHBheW1lbnQsIG9uIHRoZSBiYXNpcyBvZiBhbm51YWwgaW5jb21lLCBtb250aGx5IGNyZWRpdCBjYXJkIGJhbGFuY2UgYW5kIGlmIHRoZSBpbmRpdmlkdWFsIGlzIGEgc3R1ZGVudCBvciBub3QuCgoxLiBGYW1pbGlhcml6ZSB5b3Vyc2VsZiB3aXRoIHRoZSBkYXRhIGFuZCBwcm9kdWNlIHNvbWUgKHVzZWZ1bCEpIGV4cGxvcmF0b3J5IHBsb3RzIGFuZCBzdW1tYXJ5IHN0YXRpc3RpY3MKMi4gRml0IGEgZmlyc3QgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCBhbmQgYW5hbHl6ZSB5b3VyIGZpdHRlZCBtb2RlbC4KMy4gQWNjb3JkaW5nIHRvIHlvdXIgZml0dGVkIG1vZGVsLCBpcyBhIHN0dWRlbnQgbGVzcyBvciBtb3JlIGxpa2VseSB0byBkZWZhdWx0IChmb3IgYSBmaXhlZCB2YWx1ZSBvZiAqYmFsYW5jZSogYW5kICppbmNvbWUqKT8gSXMgdGhpcyBpbiBhZ3JlZW1lbnQgd2l0aCB5b3VyIGludmVzdGlnYXRpb25zIG9uIHF1ZXN0aW9uIDE/CjQuIFdoYXQgaXMgdGhlIHByb2JhYmlsaXR5IG9mIGRlZmF1bHQgZm9yIGEgc3R1ZGVudCBoYXZpbmcgYSBjcmVkaXQgY2FyZCBiYWxhbmNlIG9mIDY1MFwkIGFuZCBhbiBpbmNvbWUgb2YgMTcwMDBcJD8K