Search code examples
rinterpolationr-spgstat

Spatio Temporal Interpolation with gstat in R-Studio - Fitting the correct variogram


I am very new to spatial evaluation and come from psychology.

I am using the software R and the packages "gstat" and "spacetime".

I would like to do a spatio temporal interpolation. For this I follow the paper of Gräler et al. (https://cran.r-project.org/web/packages/gstat/vignettes/spatio-temporal-kriging.pdf)

Unfortunately I can't find/fit the right variogram model. I can create the empirical variogram and this is also conclusive to me, but then I do not get any further. I do not understand how to define the individual parameters such as "sill" or "nugget" or what they stand for.

Here are my previous approaches:

My ST-Dataframe:

> df.stf
An object of class "STFDF"
Slot "data":
# A tibble: 2,406 x 6
     KRS month DeprIndex MMW10 MMW25 NewInfect
   <dbl> <int>     <dbl> <dbl> <dbl>     <dbl>
 1  1001     1    -4.08   NA   NA           NA
 2  1002     1    -2.28   13.3 NA           NA
 3  1003     1    -3.29   17.8 11.7         NA
 4  1004     1    -4.31   NA   NA           NA
 5  1051     1    -2.51   17.7 NA           NA
 6  1053     1    -1.07   13.0  9.93        NA
 7  1054     1    -0.863  NA   NA           NA
 8  1055     1    -1.63   NA   NA           NA
 9  1056     1     0.887  NA   NA           NA
10  1057     1    -1.21   14.7  8.84        NA
# ... with 2,396 more rows

Slot "sp":
class       : SpatialPolygonsDataFrame 
features    : 401 
extent      : 3280359, 3921536, 5237511, 6103443  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=0 +lon_0=9 +k=1 +x_0=3500000 +y_0=0 +ellps=bessel +towgs84=598.1,73.7,418.2,0.202,0.045,-2.455,6.7 +units=m +no_defs 
variables   : 1
names       :   AGS 
min values  :  1001 
max values  : 16077 

Slot "time":
           timeIndex
0001-01-30         1
0002-01-30         2
0003-01-30         3
0004-01-30         4
0005-01-30         5
0006-01-30         6

Slot "endTime":
[1] "0002-01-30 CET" "0003-01-30 CET" "0004-01-30 CET" "0005-01-30 CET" "0006-01-30 CET" "0007-01-30 CET"

So, I have 401 german counties and the mean monthly PM10 value over six month.

My empirical variogram:

eVgmPm10<-variogramST(MMW10~1,df.stf,tlags = 0:5)

eVgmPm10$dist<-eVgmPm10$dist/1000
eVgmPm10$avgDist  <- eVgmPm10$avgDist/1000
eVgmPm10$spacelag <- eVgmPm10$spacelag/1000

plot(eVgmPm10, map=F)

empirical variogram

So far everything works as it should (I guess)

Now I wanted to make as in the paper explained a seperable model:

separableModel <- vgmST("separable",
                        space = vgm(0.9, "Exp", 200, 0.1),
                        time = vgm(0.9, "Sph", 3.5, 0.1),
                        sill = 124)

I understand, that I have to create a variogram for space and time. But I do not know how to define the parameters correctly (psill, model, range, nugget) for the variograms. If I use the same parameters as in the paper I get folowing plot:

Plot empircal and separable model

As you can see, in this model, I just have two lags and the variogram looks realy strange. So, I think it is because of the wrong parameter for my model. I also tested another approach like in the paper.

sumMetricModel <- vgmST("sumMetric",
                        space = vgm(20, "Sph", 150, 1),
                        time = vgm(10, "Exp", 2, 0.5),
                        joint = vgm(80, "Sph", 1500, 2.5),
                        stAni = 120)

Again, I do not have any idea and clue how to set the parameters so I took the same as in the paper and plot all three.

empirical, separable, sum metric

wireframe of all three

I am pretty sure it depend on the parameters but I do really not know how to find the correct variogram model to fit.

UPDATE AFTER FIRST ANSWER:

As suggested I used fewer timelage for the new empirical variogram:

eVgmPm10<-variogramST(MMW10~1,df.stf,tlags = 0:2)

I got this wireframe: wireframe

As you can see I got stil 3 time steps starting with "lag 0".

Than I tried to fill the model parameters as suggested.

separableModel<-vgmST("separable",
                      space=vgm(psill = 15,model = "Exp", range =250 ,nugget = 5),
                      time = vgm(psill = 15,model = "Exp", range =1 ,nugget = 0),
                      sill = 15)

Here is the wireframe of both:

wireframe

I really frustrated and I am really sorry for my lack of knowledge, but I really try hard to understand it and I am thankfull for every help


Solution

  • The initial parameters of sill, range and nugget can be read from the empirical spatiotemporal variogram. Produce a 3D wireframe plot of your empirical variogram by:

    plot(eVgmPm10, wireframe=T, scales=list(arrows=F))
    

    The nugget is about the approximate intersection of the surface with the z-axis (this amounts to small scale variability that the model cannot explain). The sill is about the maximum z-value your empirical surface approaches and the range is about the distance where the surface flattens out. You will have a spatial (x-axis) and temporal (y-axis) range.

    Different models interpret these values slightly differently; details are given in Section 2.1 of the gstat vignette which you reference in your question. In the separable case, e.g., the spatial and temporal variogram components are normalised to a joint sill (sill + nugget) of 1 and the model is "stretched" by the sill parameter in the model definition. In the sum-metric models, this normalisation does not apply. For the metric components, a spatiotemporal anisotropy has to be fitted in order to relate spatial and temporal distances in a single model (check gstat::estiStAni() for that). I recommend to run demo("stkrige") in R and to experiment with the parameters to get a better feeling of their influence. Note that the parameters affect each other as well which sometimes makes the interpretation and numerical optimisation troublesome.

    Based on what you provided in your question, a nugget of 5, sill of 15 and spatial range of 250 might be values to start with. Furthermore, it appears as if your empirical surface moves in waves along the temporal domain. This might be an artefact of the temporal coverage of your data set. 6 time steps (months in your case) are very few to estimate any temporal dependence, in especially beyond 1 (or maybe 2) steps. Your plot indicates that you are looking at 5 time steps, where "lag5" will only be based on the values of the first and last month. When estimating your empirical variogram consider using fewer temporal lags:

    eVgmPm10<-variogramST(MMW10~1, df.stf, tlags = 0:2)
    

    Referring to your question's title "the correct variogram", it will be very hard to identify what that is. You can only find a model which will be a "good" approximation of the dependence structure in the data that you have provided to the algorithm. The dependence might for instance change between winter and summer months and there might be seasonal components in the mean which also need to be reflected in the model.