Search code examples
rdatetimemergeinterpolationlinear-interpolation

Time series interpolation: monthly data to certain day frequency data


I have time series data containing monthly observations. I now want the given interpolate monthly values (preferred linear, cubic is fine) according to a data sequence (for eg. 15-day sequence). The intermediate goal should be to create a data frame with the date as the index and missing values for all the monthly observations.

We can consider these values starting from the start of the month, for eg. "Sep-16" indicate 2016-09-01. So we need to change the date format as well.

head(df)
#     date      sp1
# 1 Sep-16  2.58868
# 2 Oct-16  0.51014
# 3 Nov-16 -0.81165
# 4 Dec-16 -0.52575
# 5 Jan-17 -0.93884
# 6 Feb-17 -1.95657

What I want to do is to get sp1 values for the following date sequence:

DateSeq <- seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15) 

I found a question here

Any help is appreciated, thanks.

Data

df <- data.frame(date=c("Sep-16", "Oct-16", "Nov-16", "Dec-16", "Jan-17", "Feb-17", 
                        "Mar-17", "Apr-17", "May-17" , "Jun-17", "Jul-17", "Aug-17", "Sep-17", "Oct-17", 
                        "Nov-17", "Dec-17", "Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", 
                        "Jun-18", "Jul-18", "Aug-18", "Sep-18", "Oct-18", "Nov-18", "Dec-18", 
                        "Jan-19", "Feb-19", "Mar-19", "Apr-19", "May-19", "Jun-19", "Jul-19", 
                        "Aug-19", "Sep-19", "Oct-19", "Nov-19", "Dec-19",  "Jan-20", "Feb-20", 
                        "Mar-20", "Apr-20" , "May-20", "Jun-20", "Jul-20", "Aug-20"),
                 sp1=c(2.58868 , 0.51014 , -0.81165, -0.52575, -0.93884, -1.95657, 
                       1.65728,  0.00522, -0.11818, -1.14689, -1.24314, -0.50178, 
                       -1.60286, 0.04624, -0.37180, -0.16875, -0.95811, -0.79433, 
                       -0.64041, -1.27894, -0.75469, -0.89954, -1.51727, -0.03980, 
                       -0.76522,  0.11057,  0.13922, 0.45866, -0.70275, -1.37621,  0.89521, 
                       -1.41637, -0.25305, -1.29723, 1.41120, -0.98463, -0.93980, -1.50555, 
                       -1.37363, -2.33455, -0.73567, 0.30003,  1.38937,  1.56948, 
                       -0.23575, -0.52387,  0.97928,  0.40289))

Solution

  • First, you would add a day, e.g. the first of the month to the dates and convert as.Date. Then merge with an expand.grid on the seq.Date ds, and approx the values to linearly interpolate. I use sp2 to demonstrate, but you can overwrite sp1. Finally subset the data frame on the ds.

    df$date <- as.Date(paste0(df$date, '-01'), '%b-%y-%d')
    ds <- seq.Date(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15)
    df <- merge(df, expand.grid(date=ds), all=TRUE)
    df$sp2 <- approx(df$sp1, xout=seq_along(df$date))$y
    
    res <- subset(df, date %in% ds)
    
    head(res)
    #         date     sp1         sp2
    # 1 2016-09-01 2.58868  2.58868000
    # 2 2016-09-16      NA  1.54941000
    # 3 2016-10-01 0.51014  0.51014000
    # 4 2016-10-16      NA  0.06954333
    # 5 2016-10-31      NA -0.37105333
    # 7 2016-11-15      NA -0.71635000
    
    tail(res)
    #           date sp1      sp2
    # 132 2020-05-13  NA -0.33179
    # 133 2020-05-28  NA -0.42783
    # 135 2020-06-12  NA -0.02282
    # 136 2020-06-27  NA  0.47823
    # 138 2020-07-12  NA  0.78715
    # 139 2020-07-27  NA  0.59502
    

    Notice, that "2020-07-30" is not included in the sequence of 15 days, starting at Sep 1, 2016.

    Visualization

    with(df, plot(date, sp2, type='o', ylim=range(df$sp1, na.rm=TRUE)))
    with(df, points(date, sp1, pch=16))
    legend('topright', pch=c(16, 1), legend=c('measured', 'interpolated'), horiz=TRUE)
    

    enter image description here