Search code examples
rdplyraggregatetidyr

Collapse rows from 0 to 0


For a dataset like this

    Incident.ID..                date           product
    INCFI0000029582     2014-09-25 08:39:45     foo
    INCFI0000029582     2014-09-25 08:39:48     bar 
    INCFI0000029582     2014-09-25 08:40:44     foo
    INCFI0000029582     2014-10-10 23:04:00     foo
    INCFI0000029587     2014-09-25 08:33:32     bar
    INCFI0000029587     2014-09-25 08:34:41     bar
    INCFI0000029587     2014-09-25 08:35:24     bar
    INCFI0000029587     2014-10-10 23:04:00     foo


df <- structure(list(Incident.ID.. = c("INCFI0000029582", "INCFI0000029582", 
"INCFI0000029582", "INCFI0000029582", "INCFI0000029587", "INCFI0000029587", 
"INCFI0000029587", "INCFI0000029587"), date = c("2014-09-25 08:39:45", 
"2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00", 
"2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24", 
"2014-10-10 23:04:00"), product = 
c("foo","bar","foo","foo","bar","bar","bar","foo")), 
class = "data.frame", row.names = c(NA, 
-8L))

I am calculating the rolling difference in time by id using the mutate function as follows

library(dplyr)
library(lubridate)
df1 <- df %>%
  group_by(Incident.ID..) %>%
  mutate(diff = c(0, diff(ymd_hms(date))))

This creates a column diff as follows

  Incident.ID..   date                 product    diff
  INCFI0000029582 2014-09-25 08:39:45  foo        0
  INCFI0000029582 2014-09-25 08:39:48  bar        3
  INCFI0000029582 2014-09-25 08:40:44  foo        56
  INCFI0000029582 2014-10-10 23:04:00  foo        1347796
  INCFI0000029587 2014-09-25 08:33:32  bar        0
  INCFI0000029587 2014-09-25 08:34:41  bar        69
  INCFI0000029587 2014-09-25 08:35:24  bar        43
  INCFI0000029587 2014-10-10 23:04:00  foo        1348116

Now my goal is to aggregate/collapse rows from zero to zero, with the expected final dataset like this

 Incident.ID..     DateMin              DateMax              product
 INCFI0000029582   2014-09-25 08:39:45  2014-10-10 23:04:00  foo,bar,foo,foo
 INCFI0000029587   2014-09-25 08:33:32  2014-10-10 23:04:00  bar,bar,bar,foo

I am not sure how to collapse rows as shown above with a min and max date column , I need help. Thanks in advance.


Solution

  • The group_by attribute remains after the mutate, so we summarise by the group to get the min, max of 'the 'date' and collapse the 'product' by pasteing the elements together (toString is a convenient wrapper for paste(., collapse=", "))

    df %>%
       group_by(Incident.ID..) %>%
       mutate(diff = c(0, diff(ymd_hms(date)))) %>% 
       summarise(DateMin = min(date), 
                 DateMax = max(date), 
                 product = toString(product))
    # A tibble: 2 x 4
    #  Incident.ID..   DateMin             DateMax             product           
    #  <chr>           <chr>               <chr>               <chr>             
    #1 INCFI0000029582 2014-09-25 08:39:45 2014-10-10 23:04:00 foo, bar, foo, foo
    #2 INCFI0000029587 2014-09-25 08:33:32 2014-10-10 23:04:00 bar, bar, bar, foo