Search code examples
rdplyrintervals

How to fill missing time intervals


I have a dataframe with measurements taken at different intervals:

df <- data.frame(
  A_aoi = c("C", "C", "C", "B"),
  starttime_ms = c(49, 1981, 6847, 7180),
  endtime_ms = c(1981, 6115, 7048, 10080)
)

Sometimes the intervals are completely contiguous, i.e., the starttime_ms for the next measurement is the endtime_ms of the prior measurement. More often, however, there are gaps between the intervals. I need to funnel-in rows into the df whenever there is such a gap; the row should state when that gap starts and when it ends. The closest I have come so far to a solution is by detecting and measuring the duration of the gap:

library(dplyr)
df$gap <- ifelse(lag(df$starttime_ms,1) == df$endtime_ms, 
                  NA, 
                  lead(df$starttime_ms,1) - df$endtime_ms)

However that's still far from the desired output:

   A_aoi starttime_ms endtime_ms 
1     C           49        1981
2     C         1981        6115
3    NA         6115        6847
4     C         6847        7048
5    NA         7048        7180
6     B         7180       10080

Solution

  • You could use data.table package as follows:

    library(data.table)
    
    unq <- sort(unique(setDT(df)[, c(starttime_ms, endtime_ms)]))
    
    df[.(unq[-length(unq)], unq[-1]), on=c("starttime_ms", "endtime_ms")]
    
    # A_aoi starttime_ms endtime_ms     
    #     C           49       1981    
    #     C         1981       6115     
    #  <NA>         6115       6847    
    #     C         6847       7048   
    #  <NA>         7048       7180    
    #     B         7180      10080