Search code examples
rtime-seriesinterpolation

How to interpolate weekly data to hourly?


I have a dataset containing weekly observations from 2020-01-05 to 2022-12-25. It looks like the following:

datetime     NO3   NO1   NO2   NO5   NO4 fyllingsgrad
   <date>     <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>
 1 2020-11-29 0.930 0.972 0.978 0.932 0.903        0.972
 2 2021-01-03 0.770 0.821 0.884 0.769 0.785        0.821
 3 2020-04-26 0.264 0.196 0.483 0.268 0.322        0.196
 4 2021-05-09 0.253 0.151 0.453 0.185 0.430        0.151
 5 2021-10-03 0.708 0.761 0.550 0.588 0.809        0.761
 6 2022-03-13 0.364 0.190 0.311 0.242 0.505        0.190
 7 2021-01-17 0.671 0.698 0.814 0.667 0.730        0.698
 8 2021-07-11 0.724 0.772 0.667 0.593 0.731        0.772
 9 2020-12-27 0.813 0.862 0.909 0.808 0.811        0.862
10 2021-06-20 0.693 0.717 0.659 0.506 0.648        0.717

I need to interpolate the NO2 variable from the weekly time resolution to hourly. Do anyone know how this could be done?

I have tried multiple things but it doesn´t work. An example is the following code:

hour_dat <- data.frame(
  datetime = seq(as.POSIXct("2020-01-05"), as.POSIXct("2022-12-25"), by = "1 hour")
)

new_interp <- hour_dat %>% 
  mutate(NO2_interp = approx(x = mag_data$datetime, y = mag_data$NO2, xout = datetime, method = "linear")$y)

Solution

  • approx does only interpolate, not extrapolate, so hour_dat can only be in ranges of dat (which probably is the case with your real data). datetime should be the same POSIXct format. Just merge it top dat and use approx in an lapply`.

    dat$datetime <- as.POSIXct(dat$datetime)
    
    hour_dat <- data.frame(datetime=seq(as.POSIXct("2020-11-29"), as.POSIXct("2021-06-20"), by="1 hour"))
    
    ndat <- merge(dat, hour_dat, all=TRUE) 
    
    ndat[-1] <- lapply(ndat[-1], \(y) approx(ndat$datetime, y, xout=ndat$datetime)$y)
    

    Note, that this is linear interpolation.

    Gives

    head(ndat)
    #              datetime       NO3       NO1       NO2       NO5       NO4 fyllingsgrad
    # 1 2020-04-26 02:00:00 0.2640000 0.1960000 0.4830000 0.2680000 0.3220000    0.1960000
    # 2 2020-11-29 00:00:00 0.9298721 0.9718510 0.9779050 0.9318725 0.9028884    0.9718510
    # 3 2020-11-29 01:00:00 0.9300000 0.9720000 0.9780000 0.9320000 0.9030000    0.9720000
    # 4 2020-11-29 02:00:00 0.9298259 0.9718363 0.9778973 0.9318155 0.9028631    0.9718363
    # 5 2020-11-29 03:00:00 0.9296518 0.9716726 0.9777946 0.9316310 0.9027262    0.9716726
    # 6 2020-11-29 04:00:00 0.9294777 0.9715089 0.9776920 0.9314464 0.9025893    0.9715089
    

    We can quickly check graphically

    matplot(ndat[-1], type='l', lty=1)
    legend('topright', legend=names(ndat)[-1], lty=1, col=1:6, ncol=2, cex=.8, bty='n')
    

    enter image description here


    Data:

    dat <- structure(list(datetime = c("2020-11-29", "2021-01-03", "2020-04-26", 
    "2021-05-09", "2021-10-03", "2022-03-13", "2021-01-17", "2021-07-11", 
    "2020-12-27", "2021-06-20"), NO3 = c(0.93, 0.77, 0.264, 0.253, 
    0.708, 0.364, 0.671, 0.724, 0.813, 0.693), NO1 = c(0.972, 0.821, 
    0.196, 0.151, 0.761, 0.19, 0.698, 0.772, 0.862, 0.717), NO2 = c(0.978, 
    0.884, 0.483, 0.453, 0.55, 0.311, 0.814, 0.667, 0.909, 0.659), 
        NO5 = c(0.932, 0.769, 0.268, 0.185, 0.588, 0.242, 0.667, 
        0.593, 0.808, 0.506), NO4 = c(0.903, 0.785, 0.322, 0.43, 
        0.809, 0.505, 0.73, 0.731, 0.811, 0.648), fyllingsgrad = c(0.972, 
        0.821, 0.196, 0.151, 0.761, 0.19, 0.698, 0.772, 0.862, 0.717
        )), class = "data.frame", row.names = c("1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "10"))