Geostatistics with the R software

Fig.: Kriging French Temperatures (as measured by Meteo France August, 31th at 6AM)
Fig.: Kriging French Temperatures (as measured by Meteo France August, 31th at 6AM)

Before we start

Before we start, I will use some colors that are suited for color blinded people. These colors come from the RColorBrewer package. I suggest you to do that for your work, it is a small extra work for us but means a lot to some of us!

## A qualitative palette
my_colors <- c("#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F",
               "#E5C494", "#B3B3B3")
## A sequential palette
my_colors_div <- c("#FFF5EB", "#FEE6CE", "#FDD0A2", "#FDAE6B", "#FD8D3C",
                   "#F16913", "#D94801", "#A63603", "#7F2704")

Introduction

For this lab we are going to model the spatial distribution of temperatures over France. To do so we will use the geoR package that you may need to install first (do it once only!), i.e.,

install.packages("geoR")

you’re all set to load the brand new library

library(geoR)
## --------------------------------------------------------------
##  Analysis of Geostatistical Data
##  For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
##  geoR version 1.9-2 (built on 2022-08-09) is now loaded
## --------------------------------------------------------------

The data consist of two separate files:

  • the list of the weather stations
  • time series for all weather stations

To retrive the list of the weather stations you can invoke the following lines

url_stations <- "https://donneespubliques.meteofrance.fr/donnees_libres/Txt/Synop/postesSynop.csv"
desfile <- "./stations.csv"

if (!file.exists(desfile))
  download.file(url_stations, destfile = desfile)        

## Read the csv file
stations <- read.csv2(desfile, dec = ".")

## Keep only mainland stations
## all my apologies for Corsica and "territoire ultra marins"
stations <- stations[1:40,]

head(stations)
##     ID             Nom Latitude Longitude Altitude
## 1 7005       ABBEVILLE 50.13600  1.834000       69
## 2 7015   LILLE-LESQUIN 50.57000  3.097500       47
## 3 7020 PTE DE LA HAGUE 49.72517 -1.939833        6
## 4 7027  CAEN-CARPIQUET 49.18000 -0.456167       67
## 5 7037      ROUEN-BOOS 49.38300  1.181667      151
## 6 7072    REIMS-PRUNAY 49.20967  4.155333       95

To plot the locations of these weather stations we need the additional library ggmap (to be installed first)

install.packages("ggmap")

Update: To plot maps, the above packages now requires a Google API key. I don’t want to force you on getting one. Hence we will use a different approach, less user friendly though and fancy.

## install.packages("geodata")## You need to do that once
library(geodata)
## Loading required package: terra
## terra 1.7.55
france <- gadm(country = "FRA", level = 0, path=tempdir())
plot(france)
points(stations$Longitude, stations$Latitude, pch = 15, col = my_colors[1])

Time series for each stations and a given month are available from this link.

For this lab I did the work for you and choose October, 2023 (Why? I don’t know… Maybe because we are November, 2023…). To retrieve the time series, just invoke

url <- "https://mribatet.perso.math.cnrs.fr/CentraleNantes/Data/synop.202310.csv"

## To handle the date correctly it will be easier for us to
## treat it as a character string
colClasses <- rep("numeric", 60)##there are 60 variables
colClasses[2] <- "character"
data <- read.csv2(url, na.strings = "mq", dec = ".",
                  colClasses = colClasses)

## Previously we selected only mainland stations
## we need to do the same here
data <- subset(data, data$numer_sta %in% stations$ID)

head(data)
##   numer_sta           date   pmer tend cod_tend  dd  ff      t     td  u    vv
## 1      7005 20231001000000 102460  -40        8 150 2.1 286.85 285.05 89  3350
## 2      7015 20231001000000 102470  -50        8 160 1.8 285.15 283.55 90 19740
## 3      7020 20231001000000 102270  -30        6 200 4.8 291.35 289.65 90 16000
## 4      7027 20231001000000 102400  -40        8 140 1.3 286.15 285.35 95 17240
## 5      7037 20231001000000 102500  -20        8 150 2.3 286.15 285.05 93 19430
## 6      7072 20231001000000 102610  -30        6 100 1.4 283.35 282.45 94 18660
##   ww w1 w2  n nbas hbas cl cm ch   pres niv_bar geop tend24 tn12 tn24 tx12 tx24
## 1 10 NA NA NA   NA   NA NA NA NA 101590      NA   NA   -190   NA   NA   NA   NA
## 2  0 NA NA NA    0   NA NA NA NA 101900      NA   NA   -130   NA   NA   NA   NA
## 3  1  0 NA 25    0 7000 30 20 15 102160      NA   NA   -380   NA   NA   NA   NA
## 4  0 NA NA NA    0   NA NA NA NA 101590      NA   NA   -250   NA   NA   NA   NA
## 5  0 NA NA NA    0   NA NA NA NA 100610      NA   NA   -220   NA   NA   NA   NA
## 6  0 NA NA NA    0   NA NA NA NA 101450      NA   NA    -90   NA   NA   NA   NA
##   tminsol sw tw raf10 rafper per etat_sol ht_neige ssfrai perssfrai rr1  rr3
## 1  285.85 NA NA   3.9    4.4 -10        0        0     NA        NA   0 -0.1
## 2  285.95 NA NA   3.4    3.4 -10        0        0     NA        NA   0  0.0
## 3      NA NA NA   6.8    8.0 -10       NA       NA     NA        NA   0  0.0
## 4  288.45 NA NA   2.4    3.8 -10        0        0     NA        NA   0  0.0
## 5  285.45 NA NA   2.7    3.4 -10       NA        0     NA        NA   0  0.0
## 6  282.55 NA NA   2.3    3.1 -10        0        0     NA        NA   0  0.0
##    rr6 rr12 rr24 phenspe1 phenspe2 phenspe3 phenspe4 nnuage1 ctype1 hnuage1
## 1 -0.1 -0.1  0.2       NA       NA       NA       NA      NA     NA      NA
## 2  0.0  0.0  0.2       NA       NA       NA       NA      NA     NA      NA
## 3  0.0  0.0  0.0       NA       NA       NA       NA       2      0    7000
## 4  0.0  0.0  0.2       NA       NA       NA       NA      NA     NA      NA
## 5  0.0  0.0  0.2       NA       NA       NA       NA      NA     NA      NA
## 6  0.0  0.0  0.2       NA       NA       NA       NA      NA     NA      NA
##   nnuage2 ctype2 hnuage2 nnuage3 ctype3 hnuage3 nnuage4 ctype4 hnuage4  X
## 1      NA     NA      NA      NA     NA      NA      NA     NA      NA NA
## 2      NA     NA      NA      NA     NA      NA      NA     NA      NA NA
## 3      NA     NA      NA      NA     NA      NA      NA     NA      NA NA
## 4      NA     NA      NA      NA     NA      NA      NA     NA      NA NA
## 5      NA     NA      NA      NA     NA      NA      NA     NA      NA NA
## 6      NA     NA      NA      NA     NA      NA      NA     NA      NA NA

Data preprocessing:

From the above output, you can see that there are plenty of variables in our dataset, but, as stated previously, we will work on temperature only, i.e., column named ‘t’. We will thus only keep the columns:

  • ‘numer_sta’: the unique identifier of the station
  • ‘date’: the date of record (AAAAMMDDHHMISS)
  • ‘t’: the temperature in Kelvin
data <- data[,c("numer_sta", "date", "t")]

## Switch to degrees Celsius
data$t <- data$t - 273.15

The date has an uncommon format and we need the software to take care of that. R can deal with date out of the box but working with hours, minutes and seconds needs an additional library

install.packages("chron")

We are now ready to treat the date as usual

library(chron)
## 
## Attaching package: 'chron'
## The following objects are masked from 'package:terra':
## 
##     origin, origin<-
date <- substr(data$date, 1, 8)## first 8 characters
date <- as.Date(date, format = "%Y%m%d")

hour <- substr(data$date, 9, 10)
minute <- substr(data$date, 11, 12)
second <- substr(data$date, 13, 14)

tmp <- paste(hour, minute, second, sep = ":")
data$date <- as.chron(date, tmp)

Let’s try to have a look at the time series at some given locations

## Focus on k specific stations
locations <- c("PERPIGNAN", "MONTPELLIER", "NICE")

selected_data <- NULL
for (location in locations){
  ## Get its unique identifier
  location_id <- stations$ID[stations$Nom == location]
  
  ## Retrieve the corresponding time series
  data_location <- subset(data, data$numer_sta == location_id)
  selected_data <- cbind(selected_data, data_location$t)
}
## Warning in cbind(selected_data, data_location$t): number of rows of result is
## not a multiple of vector length (arg 2)
matplot(selected_data, type = "l", lty = 1, xlab = "Index",
        ylab = "Temperature (C)", col = my_colors[1:3])
legend("topleft", locations, bty = "n", lty = 1, col = my_colors[1:3],
       ncol = 3)

Though not invariably, geostatistics often deals with a single realization of the spatial process. Hence we will focus on the specitif time October, 15th at 9AM (why? I don’t know again), i.e.,

fixed_date <- chron("10/15/2023", "09:00:00")
data <- subset(data, data$date == fixed_date)

The geoR package

The geoR package is very convenient for modelling spatial data. As far as I know there is no equivalent in Python where available libraries are extremely poor. Anyway we won’t start a controversy between R and Python here, but always wonder what does this software brings to me ;-)

When using the geoR package, it is convenient to use its base class geoData. Although there are many ways to do it, we will use the following code (make sure to understand what’s going on here)

## Create a data frame with all the variables
df <- data.frame(stations, temperature = data$t)

geodata <- as.geodata(df, coords.col = 4:3,##4:3 to have coordinates expressed as (lon, lat) not (lat, lon)
                      data.col = 6, covar.col = 5)
## as.geodata: 1 points removed due to NA in the data

Descriptive analysis

Using our just created geodata object, one can do our first descriptive plot.

  1. Read the help file of the plot.geodata function.
  2. Run the following code and comment.
##Note that I use 'plot' not 'plot.geodata' as R knows the class
plot(geodata, qt.col = my_colors[1:4])
  1. Plot the data w.r.t. elevation and comment.
## Insert code
  1. If needed re-read the documentation of the plot.geodata to remove any potential trend w.r.t. latitude.
## Insert code
  1. Read the documentation of the variog and variog4£ functions and plot your first empirical variograms.
## Insert code
  1. Compare the raw (binned) empirical variogram and that where the potential trend w.r.t. latitude was removed. Comment.
## Insert code

Fitting

  1. Play (a bit don’t spend the rest of the lecture on it ;-)) with the eyefit function to learn how the variogram / covariance parameters impact the variogram behaviour.
  2. Fit an exponential variogram using the variofit function without any nugget effect.
##Insert code
  1. (Optional) Try different variogram families.

Model checking

  1. Read the documentation of the xvalid function and use it to assess if your model is accurate enough.
## Insert code
  1. What could you possibly do to get better performance. ## Prediction

  2. Read the documentation of the krige.control and krige.conv functions.

  3. Perform your first spatial interpolation on the following grid

## Insert code