Search code examples
rtimemergeintervals

Merge two data frames over ID and time interval


I am dealing with the following problem: I have 2 data frames, a and b:

#dataframe a
ID1 <- c("1", "1", "1", "1", "2", "2", "2")
time<- c("2022-04-12 08:52", "2022-04-12 15:34", "2022-04-12 16:45", "2022-04-12 22:23", "2022-04-12 02:15", "2022-04-12 05:24", "2022-04-12 14:55")
a <- data.frame(ID1, time)
a$time <- as.POSIXct(a$time, origin="1970-01-01")
a
  ID1  time
   1   2022-04-12 08:52:00
   1   2022-04-12 15:34:00
   1   2022-04-12 16:45:00
   1   2022-04-12 22:23:00
   2   2022-04-12 02:15:00
   2   2022-04-12 05:24:00
   2   2022-04-12 14:55:00 

       
    
#dataframe b        
ID2 <- c("1", "1", "1", "1", "2", "2", "2", "2")
start <- c("2022-04-12 00:00", "2022-04-12 08:00", "2022-04-12 17:00", "2022-04-12 18:00", "2022-04-12 00:00", "2022-04-12 15:00", "2022-04-12 16:00", "2022-04-12 19:00")
end <- c("2022-04-12 08:00", "2022-04-12 17:00", "2022-04-12 18:00", "2022-04-12 00:00", "2022-04-12 15:00 ", "2022-04-12 16:00", "2022-04-12 19:00", "2022-04-12 00:00")
activity <- c("At home", "Work", "Travel", "Home", "Home", "Travel", "Work", "Home")        
b <- data.frame(ID2, start, end, activity)        
b$start <- as.POSIXct(b$start,  origin="1970-01-01")        
b$end <- as.POSIXct(b$end, origin="1970-01-01")     
b
  ID2  start               end                  activity
   1   2022-04-12 00:00:00 2022-04-12 08:00:00  At home
   1   2022-04-12 08:00:00 2022-04-12 17:00:00     Work
   1   2022-04-12 17:00:00 2022-04-12 18:00:00   Travel
   1   2022-04-12 18:00:00 2022-04-12 00:00:00     Home
   2   2022-04-12 00:00:00 2022-04-12 15:00:00     Home
   2   2022-04-12 15:00:00 2022-04-12 16:00:00   Travel
   2   2022-04-12 16:00:00 2022-04-12 19:00:00     Work
   2   2022-04-12 19:00:00 2022-04-12 00:00:00     Home

  

I want to merge over ID and over point in time (when a happens to occur in b (between start and end). I would like to end up with a data frame looking like this:

ID3 <- c("1", "1", "1", "1", "2", "2", "2")
time3<- c("2022-04-12 08:52", "2022-04-12 15:34", "2022-04-12 16:45", "2022-04-12 22:23", "2022-04-12 02:15", "2022-04-12 05:24", "2022-04-12 14:55")
activity3 <- c( "Work", "Work", "Work", "Home", "Home", "Home", "Home")     
result <- data.frame(ID3, time3, activity3)
result
  ID3  time3                 activity3
   1   2022-04-12 08:52      Work
   1   2022-04-12 15:34      Work
   1   2022-04-12 16:45      Work
   1   2022-04-12 22:23      Home
   2   2022-04-12 02:15      Home
   2   2022-04-12 05:24      Home
   2   2022-04-12 14:55      Home

I am very thankful for any help on this topic.


Solution

  • First of all, you need to change end column of dataframe b, lines 4 and 8. Since you're assuming that a new day is starting, day is not 2022-04-12 00:00:00, but 2022-04-13 00:00:00.

    If you prefer, you can change lines 4 and 8, column end from 2022-04-12 00:00:00 to 2022-04-12 23:59:59. Both options are fine.

    Once you've fixed time discrepancy, follow this code:

    ID3 <- c();
    time3 <- c();
    activt <- c();
    values <- c();
    for (ide_number in unique(a$ID1)){ # let's start subseting by ID number
      c <- subset(a,a$ID1==ide_number)
      d <- subset(b,b$ID2==ide_number)
      for (i in 1:nrow(c)){ # then, find the row that matches time condition
        condition <- which(d$start<=c$time[i] & d$end>c$time[i]) # find rows
        values <- append(values,condition) # rows in a vector
        activt <- append(activt,d$activity[condition]) # activity vector
        time3 <- append(time3,as.character(c$time[i])) # time vector
        ID3 <- append(ID3,ide_number) # ID vector
      }
    }
    out <- as.data.frame(cbind(ID3,time3,activt)) # just build your output
    

    Output:

    > out
      ID3               time3 activt
    1   1 2022-04-12 08:52:00   Work
    2   1 2022-04-12 15:34:00   Work
    3   1 2022-04-12 16:45:00   Work
    4   1 2022-04-12 22:23:00   Home
    5   2 2022-04-12 02:15:00   Home
    6   2 2022-04-12 05:24:00   Home
    7   2 2022-04-12 14:55:00   Home
    >