Search code examples
rtime-seriesinterpolationsampling

Linear interpolation or resampling in R


I have a question related to interpolation. I have 2 columns ($1 is time in seconds, the other sea level). The examples I have tried mostly come from columns as date e.g. 1970-11-11, but I have records as seconds that I want to linearly interpolate to minutes. Sampling is originally every 0.3 second. Any suggestions please about which package is the best? In the following it is generating a big matrix but not reducing the quantity of values as expected. Format is just 2 cols. Trying to use in a further analysis, with data sampled not every 0.1 sec but 1 minute.

set.seed(1);
time <- rep(seq(0,180,by=0.1));
sl <-runif(1801,-0.1,4.0);
data1 <-  cbind2(time,sl);

#Output needed...
time(min)   sl(cm)


#Examples tried:

time<-data1$V1
SL<-data1$V2
seq1 <- zoo(order.by=((seq(min(time), max(time), by=30))))

mer1 <- merge(zoo(x=data1[1:2],order.by=time), seq1)
#Linear interpolation
dataL <- na.approx(mer1)

Solution

  • Here's one solution. This approach does not use any linear interpolation, but takes the average centered on each minute.

    library(dplyr) # for group_by and summarize
    colnames(data1) <- c("time", "sl")  # makes it easier to call variables by names
    data1 <- as.data.frame(data1)  
    data1$minute <- round(data1$time/60,0)  #
    head(data1)
    #  time        sl minute
    # 1  0.0 0.9885855      0
    # 2  0.1 1.4257080      0
    # 3  0.2 2.2486988      0
    # 4  0.3 3.6236519      0
    # 5  0.4 0.7268959      0
    # 6  0.5 3.5833977      0
    
    data_by_minute <- data1 %>%
          group_by(minute) %>%
          summarize(sl_avg = mean(sl))
    data_by_minute
    
    # # A tibble: 4 x 2
    #   minute sl_avg
    #    <dbl>  <dbl>
    # 1      0   1.91
    # 2      1   1.98
    # 3      2   1.87
    # 4      3   1.96
    

    An alternative approach if you just want to take the actual readings once a minute, rather than computing the average:

    data1[data1$time%%60==0,]  # only returns the observations on the minute. throws everything else out
    #      time sl
    # 1       0 0.9885855
    # 601    60 3.2384322
    # 1201  120 1.4027590
    # 1801  180 0.1525986
    

    Or if you are looking for an interpolated value you could use:

    minutes <- time/60  # calculate minutes based on the time variable
    mod_leoss <- loess(minutes~sl) # fit a loess model to your data, this is essentially a smoothed version of your sl data based on time
    Minute <- c(0,1,2,3)  # minutes for which you want a predicaiton
    SL_Preds <- predict(mod_leoss, Minute)  # calculate values from the model
    
    tableA <- cbind(Minute, SL_Preds)
    tableA
    #      Minute SL_Preds
    # [1,]      0 1.665899
    # [2,]      1 1.463291
    # [3,]      2 1.445809
    # [4,]      3 1.498165