Search code examples
rdplyrdifftime

R Difference in time between rows


I've triangulated information from other SO answers for the below code, but getting stuck with an error message. Searched SO for similar errors and resolutions but haven't been able to figure it out, so help is appreciated.

For every group ("id"), I want to get the difference between the start times for consecutive rows.

Reproducible data:

require(dplyr)
df <-data.frame(id=as.numeric(c("1","1","1","2","2","2")), 
            start= c("1/31/17 10:00","1/31/17 10:02","1/31/17 10:45", 
                             "2/10/17 12:00", "2/10/17 12:20","2/11/17 09:40"))
time <- strptime(df$start, format = "%m/%d/%y %H:%M")
df %>%
group_by(id)%>%
mutate(diff = time - lag(time),
     diff_mins = as.numeric(diff, units = 'mins'))

Gets me error:

Error in mutate_impl(.data, dots) : Column diff must be length 3 (the group size) or one, not 6 In addition: Warning message: In unclass(time1) - unclass(time2) : longer object length is not a multiple of shorter object length


Solution

  • You can use lag and difftime (per Hadley):

    df %>%
      mutate(time = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
      group_by(id) %>%
      mutate(diff = difftime(time, lag(time)))
    
    # A tibble: 6 x 4
    # Groups:   id [2]
         id start         time                diff  
      <dbl> <fct>         <dttm>              <time>
    1    1. 1/31/17 10:00 2017-01-31 10:00:00 <NA>  
    2    1. 1/31/17 10:02 2017-01-31 10:02:00 2     
    3    1. 1/31/17 10:45 2017-01-31 10:45:00 43    
    4    2. 2/10/17 12:00 2017-02-10 12:00:00 <NA>  
    5    2. 2/10/17 12:20 2017-02-10 12:20:00 20    
    6    2. 2/11/17 09:40 2017-02-11 09:40:00 1280