problems with spacetime package

I want to make a monthly space-time analysis of PM10 over German counties and plot them. Later on I want to analyze different regression models. But I can't create a spacetime object what I need for further analysis and other research questions I am going to work on. So, I started first to understand the methods and the packages, as far I can and I am stuck at the point, that I can't create a proper spacetime object.

I am orientating to the following reproducable code as a guide (source: https://edzer.github.io/UseR2016/):

data("Produc", package = "plm")
Produc[1:5,1:9]

library(maps)
states.m = map('state', plot=FALSE, fill=TRUE)
IDs <- sapply(strsplit(states.m$names, ":"), function(x) x[1])
library(maptools)

states = map2SpatialPolygons(states.m, IDs=IDs)

yrs = 1970:1986
time = as.POSIXct(paste(yrs, "-01-01", sep=""), tz = "GMT")
time

library(spacetime)
Produc.st = STFDF(states[-8], time, Produc[order(Produc[2], Produc[1]),])
library(RColorBrewer)
stplot(Produc.st[,,"unemp"], yrs, col.regions = brewer.pal(9, "YlOrRd"), cuts = 9)

I would like to evaluate, for example, the current PM10 values until 2020-06-01 monthly on county level for this I have received the data from the Federal Environment Agency of Germany. The data look as follows: PM10 is my df, the values of interest is TMW, which is the daily mean measurement of PM10.

PM10[sample(nrow(PM10),10),]
# A tibble: 10 x 9
   Station Komponente Datum      TYPEOFAREA            TYPEOFSTATION   TMW TMW_R TypeOfData Lieferung
   <chr>   <chr>      <date>     <chr>                 <chr>         <dbl> <dbl> <chr>      <chr>    
 1 DENI051 PM10       2020-02-28 ländliches Gebiet     Hintergrund    5.40     5 S          M        
 2 DETH095 PM10       2020-05-12 städtisches Gebiet    Hintergrund    9.74    10 S          M        
 3 DEBY118 PM10       2020-04-30 städtisches Gebiet    Hintergrund    5.27     5 S          M        
 4 DEBY072 PM10       2020-05-03 ländlich regional     Hintergrund    8.43     8 S          M        
 5 DEHE060 PM10       2020-06-01 ländlich regional     Hintergrund    9.43     9 S          M        
 6 DEBW087 PM10       2020-05-28 ländlich regional     Hintergrund   11.0     11 S          M        
 7 DEBW038 PM10       2020-03-11 städtisches Gebiet    Hintergrund    4.28     4 S          M        
 8 DENW065 PM10       2020-01-10 ländlich regional     Hintergrund    2.16     2 S          M        
 9 DENW096 PM10       2020-05-17 vorstädtisches Gebiet Hintergrund   13.2     13 T          M        
10 DEHE050 PM10       2020-04-20 ländliches Gebiet     Hintergrund    8.20     8 S          D

Then I downloaded a sp-file from https://gadm.org/download_country_v3.html --> Germany --> R(sp) --> level2

which contains the map of Germany on county level and it looks like this:

> de
class       : SpatialPolygonsDataFrame 
features    : 403 
extent      : 5.866251, 15.04181, 47.27012, 55.05653  (xmin, xmax, ymin, ymax)
crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 13
names       : GID_0,  NAME_0,   GID_1,            NAME_1, NL_NAME_1,     GID_2,    NAME_2, VARNAME_2, NL_NAME_2,     TYPE_2,  ENGTYPE_2,  CC_2,   HASC_2 
min values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.1_1, Ahrweiler,        NA,        NA,      Kreis,   District, 01001, DE.BB.BH 
max values  :   DEU, Germany, DEU.9_1,         Thüringen,        NA, DEU.9.9_1,   Zwickau,        NA,        NA, Water body, Water body, 16077, DE.TH.WR

since my df does not include georeferencing at the county level but the station codes, I have added this information to the dataset. The county ID in my sp-file is CC_2, which is a five digit code starting with a 0 if the ID has four digits. Example:

de$CC_2
  [1] "08425" "08211" "08426" "08115" "12065" "12066" "12067"

The first problem I guess, is that when I add the geoinformation to my df via the station codes I got my CC_2 in the df like this:

> PM10_m[sample(nrow(PM10_m),3),]
      Station Komponente      Datum         TYPEOFAREA TYPEOFSTATION       TMW TMW_R TypeOfData Lieferung  CC_2
11448 DEBW081       PM10 2020-06-07 städtisches Gebiet   Hintergrund  6.775362     7          T         M  8212
1566  DEBB066       PM10 2020-04-19  ländlich regional   Hintergrund 11.162500    11          S         M 12061
7174  DEBW027       PM10 2020-03-20 städtisches Gebiet   Hintergrund 34.791667    35          S         M  8415

As you can see, the 0 in the beginning of a four digit ID is missing, so I checked the structure of the variables:

str(PM10_m$CC_2)
 chr [1:47350] "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" ...


str(de$CC_2)
 chr [1:403] "08425" "08211" "08426" "08115" NA "08435" "08315" "08235" "08316" "08236" "08116" "08311" "08237" "08117" ...

So, both are chr but if would match them up every four digit ID would not match! So, I used to handle this by making both variables as a numeric. At this point I am not sure, if it is right when I am doing it this way.

> PM10_m$CC_2<-as.numeric(PM10_m$CC_2)
> de$CC_2.2<-as.numeric(de$CC_2)

Before I merge them, I used to aggregate the PM10_m df by county ID and date.

PM10_aggr<-aggregate(PM10_m$TMW, by = list(PM10_m$Datum, PM10_m$CC_2), FUN="mean", na.rm=T)

I merged now the df and the polygon df de, to see if it worked.

de_t<-merge(de,PM10_aggr, by.x="CC_2.2", by.y="CC_2", na.rm=T,duplicateGeoms=TRUE)

As far I can see, it matched properly: Plotting with tmap

Now, I started to create a spacetime object, following the steps like in the guide (see in the beginning):

First I added the month into my df PM10_aggr

PM10_f<-PM10_aggr
PM10_f$month<-strftime(PM10_aggr$date, format = "%m")

> PM10_f[sample(nrow(PM10_f),4),]
            date  CC_2     TMW10 month
26303 2020-04-04 13062  6.136208    04
24703 2020-05-12 12072  7.506250    05
4808  2020-03-16  3452 13.933222    03
30502 2020-04-17 16051 30.121002    04

Creating SpaceTime object:

month = 01:06
time = as.POSIXct(paste(month, "-01-01", sep=""), tz = "GMT")
time

[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"

It worked not like in the guide but as far as I understand, it is just creating and categorial time object. So, I steped forward of the guide:

library(spacetime)

pm10.st = STFDF(de, time, PM10_f[order(PM10_f[4], PM10_f[1]),])
Error in validityMethod(object) : 
  nrow(object@data) == length(object@sp) * nrow(object@time) is not TRUE

I read that the command STFDF can't work with missing geopoints and that I have to use the command STIDF instead.

So, this is what I get:

pm10.st = STIDF(de, time, PM10_f[order(PM10_f[4], PM10_f[1]),])

> pm10.st
An object of class "STIDF"
Slot "data":
          date  KRS    TMW10 month month1
1   2020-01-01 1002 33.34608    01      1
183 2020-01-01 1003 81.06596    01      1
365 2020-01-01 1051 53.14400    01      1
547 2020-01-01 1053 34.36517    01      1
729 2020-01-01 1054      NaN    01      1
911 2020-01-01 1057 32.04604    01      1

Slot "sp":
class       : SpatialPolygonsDataFrame 
features    : 6 
extent      : 8.108812, 10.24141, 47.5024, 48.86768  (xmin, xmax, ymin, ymax)
crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 14
names       : GID_0,  NAME_0,   GID_1,            NAME_1, NL_NAME_1,     GID_2,          NAME_2, VARNAME_2, NL_NAME_2,     TYPE_2,  ENGTYPE_2,  CC_2,   HASC_2, CC_2.2 
min values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.1_1, Alb-Donau-Kreis,        NA,        NA,  Landkreis,   District, 08115, DE.BW.AD,   8115 
max values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.6_1,   Bodenseekreis,        NA,        NA, Water body, Water body, 08435, DE.BW.BR,   8435 

Slot "time":
           timeIndex
0001-01-01         1
0002-01-01         2
0003-01-01         3
0004-01-01         4
0005-01-01         5
0006-01-01         6

Slot "endTime":
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"

I was really suprised when I saw, that the command just took 6 rows from the df and matches with just 6 features of the polygon df. I can Plot this STIDF:Plot STIDF

But as you can see it does not worked properly. So, I guessed, may I have to aggregate by month and county ID:

pm10.f<-aggregate(PM10_f$TMW10, by = list(PM10_f$month, PM10_f$KRS),FUN="mean", na.rm=T)

> str(pm10.f)
'data.frame':   1092 obs. of  3 variables:
 $ month: chr  "01" "02" "03" "04" ...
 $ CID  : num  1002 1002 1002 1002 1002 ...
 $ MMW10: num  13.3 11.1 14.2 16.1 12.4 ...

### CID is the County ID ###

> pm10.f[sample(nrow(pm10.f),5),]
     month   CID     MMW10
234     06  5158 16.637490
704     02  9775 11.083747
1030    04 16055 18.934881
842     02 13054  8.594628
513     03  8121 16.9119

So, I tried again with the STIDF command:

pm10.stf = STIDF(de, time, pm10.f[order(pm10.f[1], pm10.f[1]),])

> pm10.stf
An object of class "STIDF"
Slot "data":
   month  CID    MMW10
1     01 1002 13.31264
7     01 1003 17.81540
13    01 1051 17.67919
19    01 1053 12.99228
25    01 1054      NaN
31    01 1057 14.71878

Slot "sp":
class       : SpatialPolygonsDataFrame 
features    : 6 
extent      : 8.108812, 10.24141, 47.5024, 48.86768  (xmin, xmax, ymin, ymax)
crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 14
names       : GID_0,  NAME_0,   GID_1,            NAME_1, NL_NAME_1,     GID_2,          NAME_2, VARNAME_2, NL_NAME_2,     TYPE_2,  ENGTYPE_2,  CC_2,   HASC_2, CC_2.2 
min values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.1_1, Alb-Donau-Kreis,        NA,        NA,  Landkreis,   District, 08115, DE.BW.AD,   8115 
max values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.6_1,   Bodenseekreis,        NA,        NA, Water body, Water body, 08435, DE.BW.BR,   8435 

Slot "time":
           timeIndex
0001-01-01         1
0002-01-01         2
0003-01-01         3
0004-01-01         4
0005-01-01         5
0006-01-01         6

Slot "endTime":
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"

I got the same problem, again just 6 random rows are matched with 6 counties: plot STIDF 2

Even if I delete the order command I got the same problems with just 6 rows from df and 6 features from polygon df:

pm10.stf = STIDF(de, time, pm10.f)

> pm10.stf
An object of class "STIDF"
Slot "data":
  month  CID    MMW10
1    01 1002 13.31264
2    02 1002 11.10590
3    03 1002 14.19649
4    04 1002 16.10512
5    05 1002 12.38511
6    06 1002 13.10104

Slot "sp":
class       : SpatialPolygonsDataFrame 
features    : 6 
extent      : 8.108812, 10.24141, 47.5024, 48.86768  (xmin, xmax, ymin, ymax)
crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
variables   : 14
names       : GID_0,  NAME_0,   GID_1,            NAME_1, NL_NAME_1,     GID_2,          NAME_2, VARNAME_2, NL_NAME_2,     TYPE_2,  ENGTYPE_2,  CC_2,   HASC_2, CC_2.2 
min values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.1_1, Alb-Donau-Kreis,        NA,        NA,  Landkreis,   District, 08115, DE.BW.AD,   8115 
max values  :   DEU, Germany, DEU.1_1, Baden-Württemberg,        NA, DEU.1.6_1,   Bodenseekreis,        NA,        NA, Water body, Water body, 08435, DE.BW.BR,   8435 

Slot "time":
           timeIndex
0001-01-01         1
0002-01-01         2
0003-01-01         3
0004-01-01         4
0005-01-01         5
0006-01-01         6

Slot "endTime":
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"

I got 6 rows of one county in the df but different 6 polygon features. It seems that the STIDF command is just taking the first 6 polygons from the polygon df.

Solution

First, I noticed that my shapefile had more elements than there are actual districts. This is because the shapefile contains "DoubleGeoms". So I aggregated the shapefile as follows:

raster::aggregate(de, by="AGS")

Then it occurred to me that I have a logical error in thinking. So I have 401 districts and have practically 6 measurement times (6 months), so my dataframe should have 401*6=2406 rows. This means that I had to adjust my dataframe. So I took the 401 districts and expanded them:

df<-tidyr::expand_grid(KRS=df$KRS,1:6)

After adding the variables to the new dataframe using "merge" command by district and month, I could now use the "STFDF" command from the "spacetime" package:

df.stf <- STFDF(de2, time, df[order(df[2], df[1]),])

And this is the result: