I have time series data containing monthly observations. I now want the given interpolate monthly values (preferred linear, cubic is fine) according to a data sequence (for eg. 15-day sequence). The intermediate goal should be to create a data frame with the date as the index and missing values for all the monthly observations.
We can consider these values starting from the start of the month, for eg. "Sep-16" indicate 2016-09-01. So we need to change the date format as well.
head(df)
# date sp1
# 1 Sep-16 2.58868
# 2 Oct-16 0.51014
# 3 Nov-16 -0.81165
# 4 Dec-16 -0.52575
# 5 Jan-17 -0.93884
# 6 Feb-17 -1.95657
What I want to do is to get sp1 values for the following date sequence:
DateSeq <- seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15)
I found a question here
Any help is appreciated, thanks.
df <- data.frame(date=c("Sep-16", "Oct-16", "Nov-16", "Dec-16", "Jan-17", "Feb-17",
"Mar-17", "Apr-17", "May-17" , "Jun-17", "Jul-17", "Aug-17", "Sep-17", "Oct-17",
"Nov-17", "Dec-17", "Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18",
"Jun-18", "Jul-18", "Aug-18", "Sep-18", "Oct-18", "Nov-18", "Dec-18",
"Jan-19", "Feb-19", "Mar-19", "Apr-19", "May-19", "Jun-19", "Jul-19",
"Aug-19", "Sep-19", "Oct-19", "Nov-19", "Dec-19", "Jan-20", "Feb-20",
"Mar-20", "Apr-20" , "May-20", "Jun-20", "Jul-20", "Aug-20"),
sp1=c(2.58868 , 0.51014 , -0.81165, -0.52575, -0.93884, -1.95657,
1.65728, 0.00522, -0.11818, -1.14689, -1.24314, -0.50178,
-1.60286, 0.04624, -0.37180, -0.16875, -0.95811, -0.79433,
-0.64041, -1.27894, -0.75469, -0.89954, -1.51727, -0.03980,
-0.76522, 0.11057, 0.13922, 0.45866, -0.70275, -1.37621, 0.89521,
-1.41637, -0.25305, -1.29723, 1.41120, -0.98463, -0.93980, -1.50555,
-1.37363, -2.33455, -0.73567, 0.30003, 1.38937, 1.56948,
-0.23575, -0.52387, 0.97928, 0.40289))
First, you would add a day, e.g. the first of the month to the dates and convert as.Date
. Then merge
with an expand.grid
on the seq.Date
ds
, and approx
the values to linearly interpolate. I use sp2
to demonstrate, but you can overwrite sp1
. Finally subset
the data frame on the ds
.
df$date <- as.Date(paste0(df$date, '-01'), '%b-%y-%d')
ds <- seq.Date(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15)
df <- merge(df, expand.grid(date=ds), all=TRUE)
df$sp2 <- approx(df$sp1, xout=seq_along(df$date))$y
res <- subset(df, date %in% ds)
head(res)
# date sp1 sp2
# 1 2016-09-01 2.58868 2.58868000
# 2 2016-09-16 NA 1.54941000
# 3 2016-10-01 0.51014 0.51014000
# 4 2016-10-16 NA 0.06954333
# 5 2016-10-31 NA -0.37105333
# 7 2016-11-15 NA -0.71635000
tail(res)
# date sp1 sp2
# 132 2020-05-13 NA -0.33179
# 133 2020-05-28 NA -0.42783
# 135 2020-06-12 NA -0.02282
# 136 2020-06-27 NA 0.47823
# 138 2020-07-12 NA 0.78715
# 139 2020-07-27 NA 0.59502
Notice, that "2020-07-30"
is not included in the sequence of 15 days, starting at Sep 1, 2016.
with(df, plot(date, sp2, type='o', ylim=range(df$sp1, na.rm=TRUE)))
with(df, points(date, sp1, pch=16))
legend('topright', pch=c(16, 1), legend=c('measured', 'interpolated'), horiz=TRUE)