Search code examples
rdataframegroup-bymergecombine

How to combine two data-frames by group and date


I'd like to combine two dataframes d1 and d2 in R. It should be grouped by "ID", but at the end there should only exist one column "date" so that further analysis are possible (It is important to know if the "number"-entry (d2) is linked to the "type"-entry (d1) or how close it is to it)

Lets say d1 and d2 is

d1 <- data.frame(ID = c(1,1,2,2,2,3), 
                 sex = c("male", "male", "male", "male", "male", "female"),
                 date = c("2016-06-21 08:15:00", "2016-06-21 23:00:00",
                          "2017-04-19 07:15:00", "2017-04-20 07:15:00", "2017-04-20 08:15:00",
                          "2017-08-20 23:15:00" ),
                 type = c("horse", "snake", "horse", "horse", "monkey", "fish"))

d2 <- data.frame(ID = c(1,1,1,2,2,3,3), 
                 date = c("2016-06-20 08:15:00", "2016-06-21 22:15:00", "2016-06-22 08:45:00",
                          "2017-04-19 07:15:00", "2017-04-19 09:30:00",
                          "2017-08-19 08:15:00", "2017-08-21 06:30:00"),
                 number = c(65,234,64,234,6243,23,243))

My desired output would be

d12 <- data.frame(ID = c(1,1,1,1,1,2,2,2,2,3,3,3),
                  sex = c(NA, "male", NA, "male", NA, "male", NA, "male", "male" , NA, "female", NA),
                  date = c("2016-06-20 08:15:00","2016-06-21 08:15:00", "2016-06-21 22:15:00", "2016-06-21 23:00:00", "2016-06-22 08:45:00",
                          "2017-04-19 07:15:00", "2017-04-19 09:30:00", "2017-04-20 07:15:00", "2017-04-20 08:15:00",
                          "2017-08-19 08:15:00", "2017-08-20 23:15:00", "2017-08-21 06:30:00" ),
                  type = c(NA, "horse", NA, "snake", NA, "horse", NA, "horse", "monkey", NA, "fish", NA),
                  number= c(65, NA, 234, NA, 64, 234, 6243, NA, NA, 23, NA, 243))

I tried doing this

d12 <- full_join(
  d1 %>% group_by(ID) %>% mutate(time=row_number()),
  d2 %>% group_by(ID) %>% mutate(time=row_number()),
by = c("ID", "time"))

But more or less I just added some columns, I don't really come closer to my desired output. I am very looking forward to your help.


Solution

  • I'm still a little bit puzzled about this question. You could try

    library(dplyr)
    
    d1 %>% 
      full_join(d2, by = c("ID", "date")) %>% 
      mutate(date = as.POSIXct(date)) %>% 
      arrange(ID, date) %>% 
      tibble() # this line isn't really necessary
    

    This returns

          ID sex    date                type   number
       <dbl> <chr>  <dttm>              <chr>   <dbl>
     1     1 NA     2016-06-20 08:15:00 NA         65
     2     1 male   2016-06-21 08:15:00 horse      NA
     3     1 NA     2016-06-21 22:15:00 NA        234
     4     1 male   2016-06-21 23:00:00 snake      NA
     5     1 NA     2016-06-22 08:45:00 NA         64
     6     2 male   2017-04-19 07:15:00 horse     234
     7     2 NA     2017-04-19 09:30:00 NA       6243
     8     2 male   2017-04-20 07:15:00 horse      NA
     9     2 male   2017-04-20 08:15:00 monkey     NA
    10     3 NA     2017-08-19 08:15:00 NA         23
    11     3 female 2017-08-20 23:15:00 fish       NA
    12     3 NA     2017-08-21 06:30:00 NA        243