The R framework has (strictly speaking) no abilities to perform survival analysis. The survival package fill in this gap. To be able to use it, you must load it each time your launch a new session by invokin

library(survival)

One main characteristics of survival data is censoring. How do deal with such data using the survival package?

The main function to handle such data is the Surv function which apparently is doing nothing except that now data belongs to the class Surv. It is thus mandatory to use it extensively in your analysis.

1. Handling right censored data

2. Generator fans

We are going to work with the genfan data included in the survival package.

3. Motor insulation

We are now going to work with the imotor data included in the survival package.

  • Load the data imotor and have a look at its documentation.
  • For each temperature, fit a survival function using the Kaplan–Meier estimator.
  • If, in the previous question, you make several call to the survfit function, try to make use of an R formula to do it in a single call.
  • Analyze the outputs of the summary and print functions when applied to your just fitted Kaplan–Meier estimates?
  • Plot the estimated survival curves.

4. Confidence intervals

Recall that a widely use confidence interval for some parameter \(\theta\) is a symmetric one which is typically of the form \[ \left[\hat{\theta} - z_{1 - \alpha / 2} \sqrt{\mbox{Var}(\hat{\theta})}, \hat{\theta} - z_{1 - \alpha / 2} \sqrt{\mbox{Var}(\hat{\theta})} \right], \]
where \(\hat{\theta}\) is an estimator of \(\theta\) which is supposed here to be at least asymptotically normal and whose standard error is \(\sqrt{\mbox{Var}(\hat{\theta})}\).

5. How dumb see survivals? (Optional)

In this exercise, we are going to see what happens if we were very dumb and completely ignore censoring. To this aim we will work on simulated data to “know the truth” (and thus see how dumb we are!). Recall that right censoring corresponds to the situation where we observe realizations from the random variable \[T = \min(C, T_*),\] where \(C\) is a random variable related to censoring and \(T_*\) is the time to failure (and only the latter is of interest). Throughout this exercise, we will assume that both \(T_*\) and \(C\) have an exponential distribution with parameter \(\lambda_*\) and \(\lambda_c\) respectively and \(C\) and \(T_*\) are independent.

Without loss of generality, in our simulation study, we will assume that \(\lambda_c = 1\) and will only vary \(\lambda_*\).

6. Tongue cancer

A study was conducted on the effects of ploidy on the prognosis of patients with cancers of the mouth. Patients were selected who had a paraffin-embedded sample of the cancerous tissue taken at the time of surgery. Follow-up survival data was obtained on each patient. The tissue samples were examined using a flow cytometer to determine if the tumor had an aneuploid (abnormal) or diploid (normal) DNA profile using a technique discussed in Sickle–Santanello et al. (1988). Times are in weeks.

7. Cox’s regression: the basics

In this exercise we will learn how to make use of the *Cox’s proportional hazards model.

fit <- coxph(Surv(hours, status) ~ 1, data = genfan)
fit
plot(survfit(fit))

8. Start working on your project (if grading is about project not exam)

This exercise is a actually not an exercise but explain the expectation for grading this lecture. You will have to conduct a whole survival analysis on a dataset chosen from the ones listed below. You should write a technical report (and include your R script as a separate file) and upload it using the dedicated place on Hippocampus.

