I want to make a monthly space-time analysis of PM10 over German counties and plot them. Later on I want to analyze different regression models. But I can't create a spacetime object what I need for further analysis and other research questions I am going to work on. So, I started first to understand the methods and the packages, as far I can and I am stuck at the point, that I can't create a proper spacetime object.
I am orientating to the following reproducable code as a guide (source: https://edzer.github.io/UseR2016/):
data("Produc", package = "plm")
Produc[1:5,1:9]
library(maps)
states.m = map('state', plot=FALSE, fill=TRUE)
IDs <- sapply(strsplit(states.m$names, ":"), function(x) x[1])
library(maptools)
states = map2SpatialPolygons(states.m, IDs=IDs)
yrs = 1970:1986
time = as.POSIXct(paste(yrs, "-01-01", sep=""), tz = "GMT")
time
library(spacetime)
Produc.st = STFDF(states[-8], time, Produc[order(Produc[2], Produc[1]),])
library(RColorBrewer)
stplot(Produc.st[,,"unemp"], yrs, col.regions = brewer.pal(9, "YlOrRd"), cuts = 9)
I would like to evaluate, for example, the current PM10 values until 2020-06-01 monthly on county level for this I have received the data from the Federal Environment Agency of Germany. The data look as follows: PM10 is my df, the values of interest is TMW, which is the daily mean measurement of PM10.
PM10[sample(nrow(PM10),10),]
# A tibble: 10 x 9
Station Komponente Datum TYPEOFAREA TYPEOFSTATION TMW TMW_R TypeOfData Lieferung
<chr> <chr> <date> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 DENI051 PM10 2020-02-28 ländliches Gebiet Hintergrund 5.40 5 S M
2 DETH095 PM10 2020-05-12 städtisches Gebiet Hintergrund 9.74 10 S M
3 DEBY118 PM10 2020-04-30 städtisches Gebiet Hintergrund 5.27 5 S M
4 DEBY072 PM10 2020-05-03 ländlich regional Hintergrund 8.43 8 S M
5 DEHE060 PM10 2020-06-01 ländlich regional Hintergrund 9.43 9 S M
6 DEBW087 PM10 2020-05-28 ländlich regional Hintergrund 11.0 11 S M
7 DEBW038 PM10 2020-03-11 städtisches Gebiet Hintergrund 4.28 4 S M
8 DENW065 PM10 2020-01-10 ländlich regional Hintergrund 2.16 2 S M
9 DENW096 PM10 2020-05-17 vorstädtisches Gebiet Hintergrund 13.2 13 T M
10 DEHE050 PM10 2020-04-20 ländliches Gebiet Hintergrund 8.20 8 S D
Then I downloaded a sp-file from https://gadm.org/download_country_v3.html --> Germany --> R(sp) --> level2
which contains the map of Germany on county level and it looks like this:
> de
class : SpatialPolygonsDataFrame
features : 403
extent : 5.866251, 15.04181, 47.27012, 55.05653 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 13
names : GID_0, NAME_0, GID_1, NAME_1, NL_NAME_1, GID_2, NAME_2, VARNAME_2, NL_NAME_2, TYPE_2, ENGTYPE_2, CC_2, HASC_2
min values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.1_1, Ahrweiler, NA, NA, Kreis, District, 01001, DE.BB.BH
max values : DEU, Germany, DEU.9_1, Thüringen, NA, DEU.9.9_1, Zwickau, NA, NA, Water body, Water body, 16077, DE.TH.WR
since my df does not include georeferencing at the county level but the station codes, I have added this information to the dataset. The county ID in my sp-file is CC_2, which is a five digit code starting with a 0 if the ID has four digits. Example:
de$CC_2
[1] "08425" "08211" "08426" "08115" "12065" "12066" "12067"
The first problem I guess, is that when I add the geoinformation to my df via the station codes I got my CC_2 in the df like this:
> PM10_m[sample(nrow(PM10_m),3),]
Station Komponente Datum TYPEOFAREA TYPEOFSTATION TMW TMW_R TypeOfData Lieferung CC_2
11448 DEBW081 PM10 2020-06-07 städtisches Gebiet Hintergrund 6.775362 7 T M 8212
1566 DEBB066 PM10 2020-04-19 ländlich regional Hintergrund 11.162500 11 S M 12061
7174 DEBW027 PM10 2020-03-20 städtisches Gebiet Hintergrund 34.791667 35 S M 8415
As you can see, the 0 in the beginning of a four digit ID is missing, so I checked the structure of the variables:
str(PM10_m$CC_2)
chr [1:47350] "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" "12062" ...
str(de$CC_2)
chr [1:403] "08425" "08211" "08426" "08115" NA "08435" "08315" "08235" "08316" "08236" "08116" "08311" "08237" "08117" ...
So, both are chr but if would match them up every four digit ID would not match! So, I used to handle this by making both variables as a numeric. At this point I am not sure, if it is right when I am doing it this way.
> PM10_m$CC_2<-as.numeric(PM10_m$CC_2)
> de$CC_2.2<-as.numeric(de$CC_2)
Before I merge them, I used to aggregate the PM10_m df by county ID and date.
PM10_aggr<-aggregate(PM10_m$TMW, by = list(PM10_m$Datum, PM10_m$CC_2), FUN="mean", na.rm=T)
I merged now the df and the polygon df de, to see if it worked.
de_t<-merge(de,PM10_aggr, by.x="CC_2.2", by.y="CC_2", na.rm=T,duplicateGeoms=TRUE)
As far I can see, it matched properly: Plotting with tmap
Now, I started to create a spacetime object, following the steps like in the guide (see in the beginning):
First I added the month into my df PM10_aggr
PM10_f<-PM10_aggr
PM10_f$month<-strftime(PM10_aggr$date, format = "%m")
> PM10_f[sample(nrow(PM10_f),4),]
date CC_2 TMW10 month
26303 2020-04-04 13062 6.136208 04
24703 2020-05-12 12072 7.506250 05
4808 2020-03-16 3452 13.933222 03
30502 2020-04-17 16051 30.121002 04
Creating SpaceTime object:
month = 01:06
time = as.POSIXct(paste(month, "-01-01", sep=""), tz = "GMT")
time
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"
It worked not like in the guide but as far as I understand, it is just creating and categorial time object. So, I steped forward of the guide:
library(spacetime)
pm10.st = STFDF(de, time, PM10_f[order(PM10_f[4], PM10_f[1]),])
Error in validityMethod(object) :
nrow(object@data) == length(object@sp) * nrow(object@time) is not TRUE
I read that the command STFDF can't work with missing geopoints and that I have to use the command STIDF instead.
So, this is what I get:
pm10.st = STIDF(de, time, PM10_f[order(PM10_f[4], PM10_f[1]),])
> pm10.st
An object of class "STIDF"
Slot "data":
date KRS TMW10 month month1
1 2020-01-01 1002 33.34608 01 1
183 2020-01-01 1003 81.06596 01 1
365 2020-01-01 1051 53.14400 01 1
547 2020-01-01 1053 34.36517 01 1
729 2020-01-01 1054 NaN 01 1
911 2020-01-01 1057 32.04604 01 1
Slot "sp":
class : SpatialPolygonsDataFrame
features : 6
extent : 8.108812, 10.24141, 47.5024, 48.86768 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 14
names : GID_0, NAME_0, GID_1, NAME_1, NL_NAME_1, GID_2, NAME_2, VARNAME_2, NL_NAME_2, TYPE_2, ENGTYPE_2, CC_2, HASC_2, CC_2.2
min values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.1_1, Alb-Donau-Kreis, NA, NA, Landkreis, District, 08115, DE.BW.AD, 8115
max values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.6_1, Bodenseekreis, NA, NA, Water body, Water body, 08435, DE.BW.BR, 8435
Slot "time":
timeIndex
0001-01-01 1
0002-01-01 2
0003-01-01 3
0004-01-01 4
0005-01-01 5
0006-01-01 6
Slot "endTime":
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"
I was really suprised when I saw, that the command just took 6 rows from the df and matches with just 6 features of the polygon df. I can Plot this STIDF:Plot STIDF
But as you can see it does not worked properly. So, I guessed, may I have to aggregate by month and county ID:
pm10.f<-aggregate(PM10_f$TMW10, by = list(PM10_f$month, PM10_f$KRS),FUN="mean", na.rm=T)
> str(pm10.f)
'data.frame': 1092 obs. of 3 variables:
$ month: chr "01" "02" "03" "04" ...
$ CID : num 1002 1002 1002 1002 1002 ...
$ MMW10: num 13.3 11.1 14.2 16.1 12.4 ...
### CID is the County ID ###
> pm10.f[sample(nrow(pm10.f),5),]
month CID MMW10
234 06 5158 16.637490
704 02 9775 11.083747
1030 04 16055 18.934881
842 02 13054 8.594628
513 03 8121 16.9119
So, I tried again with the STIDF command:
pm10.stf = STIDF(de, time, pm10.f[order(pm10.f[1], pm10.f[1]),])
> pm10.stf
An object of class "STIDF"
Slot "data":
month CID MMW10
1 01 1002 13.31264
7 01 1003 17.81540
13 01 1051 17.67919
19 01 1053 12.99228
25 01 1054 NaN
31 01 1057 14.71878
Slot "sp":
class : SpatialPolygonsDataFrame
features : 6
extent : 8.108812, 10.24141, 47.5024, 48.86768 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 14
names : GID_0, NAME_0, GID_1, NAME_1, NL_NAME_1, GID_2, NAME_2, VARNAME_2, NL_NAME_2, TYPE_2, ENGTYPE_2, CC_2, HASC_2, CC_2.2
min values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.1_1, Alb-Donau-Kreis, NA, NA, Landkreis, District, 08115, DE.BW.AD, 8115
max values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.6_1, Bodenseekreis, NA, NA, Water body, Water body, 08435, DE.BW.BR, 8435
Slot "time":
timeIndex
0001-01-01 1
0002-01-01 2
0003-01-01 3
0004-01-01 4
0005-01-01 5
0006-01-01 6
Slot "endTime":
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"
I got the same problem, again just 6 random rows are matched with 6 counties: plot STIDF 2
Even if I delete the order command I got the same problems with just 6 rows from df and 6 features from polygon df:
pm10.stf = STIDF(de, time, pm10.f)
> pm10.stf
An object of class "STIDF"
Slot "data":
month CID MMW10
1 01 1002 13.31264
2 02 1002 11.10590
3 03 1002 14.19649
4 04 1002 16.10512
5 05 1002 12.38511
6 06 1002 13.10104
Slot "sp":
class : SpatialPolygonsDataFrame
features : 6
extent : 8.108812, 10.24141, 47.5024, 48.86768 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 14
names : GID_0, NAME_0, GID_1, NAME_1, NL_NAME_1, GID_2, NAME_2, VARNAME_2, NL_NAME_2, TYPE_2, ENGTYPE_2, CC_2, HASC_2, CC_2.2
min values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.1_1, Alb-Donau-Kreis, NA, NA, Landkreis, District, 08115, DE.BW.AD, 8115
max values : DEU, Germany, DEU.1_1, Baden-Württemberg, NA, DEU.1.6_1, Bodenseekreis, NA, NA, Water body, Water body, 08435, DE.BW.BR, 8435
Slot "time":
timeIndex
0001-01-01 1
0002-01-01 2
0003-01-01 3
0004-01-01 4
0005-01-01 5
0006-01-01 6
Slot "endTime":
[1] "0001-01-01 GMT" "0002-01-01 GMT" "0003-01-01 GMT" "0004-01-01 GMT" "0005-01-01 GMT" "0006-01-01 GMT"
I got 6 rows of one county in the df but different 6 polygon features. It seems that the STIDF command is just taking the first 6 polygons from the polygon df.
First, I noticed that my shapefile had more elements than there are actual districts. This is because the shapefile contains "DoubleGeoms". So I aggregated the shapefile as follows:
raster::aggregate(de, by="AGS")
Then it occurred to me that I have a logical error in thinking. So I have 401 districts and have practically 6 measurement times (6 months), so my dataframe should have 401*6=2406 rows. This means that I had to adjust my dataframe. So I took the 401 districts and expanded them:
df<-tidyr::expand_grid(KRS=df$KRS,1:6)
After adding the variables to the new dataframe using "merge" command by district and month, I could now use the "STFDF" command from the "spacetime" package:
df.stf <- STFDF(de2, time, df[order(df[2], df[1]),])