Search code examples
rdataframemergetime-seriesmismatch

How to collate two time series dataframes that do not have matching times and contain different columns


I have two dataframes

df1 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:00","2022-07-29 00:05:00","2022-07-29 00:10:00","2022-07-29 00:15:00","2022-07-29 00:20:00")), c(1,2,3,4,5)), c("timeStamp", "value"))
df2 <- setNames(data.frame(c(as.POSIXct("2022-07-29 00:00:05","2022-07-29 00:05:05","2022-07-29 00:20:05")), c("a","b","c")), c("timeStamp", "text"))

I want to merge them into a single data frame by timeStamp, where the times are matched in chronological order, and the mismatching columns are simply filled in with NA. So it would read

timestamp              value     text
"2022-07-29 00:00:00"  1         NA
"2022-07-29 00:00:05"  NA        a
...

I can't for my life figure out why simply merge(x,y,by=) doesn't work for this. neither does left_join. It seems so simple and I can't seem to figure it out.

Also, a bonus would be to make it work this way as well, in which each time there is a change in text, all the rows below it remain the last text until the text changes again prompted by df2:

timestamp              value     text
"2022-07-29 00:00:00"  1         NA
"2022-07-29 00:00:05"  NA        a
"2022-07-29 00:05:00"  2         a
"2022-07-29 00:05:05"  NA        b
"2022-07-29 00:10:00"  3         b
"2022-07-29 00:15:00"  4         b
"2022-07-29 00:20:00"  5         b
"2022-07-29 00:20:05"  NA        c

Thanks


Solution

  • You need to do a full join.

    tidyverse solution:

    df1 %>% 
      full_join(df2, c("timeStamp" = "timeStamp")) %>% 
      arrange(timeStamp) %>% 
      tidyr::fill(text)
    
                timeStamp value text
    1 2022-07-29 00:00:00     1 <NA>
    2 2022-07-29 00:00:05    NA    a
    3 2022-07-29 00:05:00     2    a
    4 2022-07-29 00:05:05    NA    b
    5 2022-07-29 00:10:00     3    b
    6 2022-07-29 00:15:00     4    b
    7 2022-07-29 00:20:00     5    b
    8 2022-07-29 00:20:05    NA    c
    

    Base R merge:

    merge(df1, df2, by = c("timeStamp" = "timeStamp"), all = T)