Search code examples
mysqlsqlranalytics

Calculating time difference by groups


I have a question about calculating the time difference for different sources and targets.

Table1:         

Source      Target     Time                   TimeDif(wrong) (right)
1.2.3.4     2.3.4.5    2012-01-03 21:50:40    3               3
1.2.3.4     2.3.4.5    2014-01-03 21:50:43    5               5
1.2.3.4     2.3.4.5    2014-01-03 21:50:48    3               NULL
2.2.2.2     4.4.4.4    2014-01-03 21:50:51    3               4
2.2.2.2     4.4.4.4    2014-01-03 21:50:55    4               4
2.2.2.2     4.4.4.4    2014-01-03 21:50:59    4               NULL
....        ....       ......

Right now I calculate the time difference with:

diffTime <- difftime(time[1:(length(time)-1)] , time[2:length(time)]) * -1

The problem is that the time difference by calculating this method are not correct. That means the method calculate the whole column and does not make any decision between different sources and targets. I'm not sure if this problem can be fixed by the package (sqldf) to query and to group the data together. The query should also implement the method, but I think that is not possible. So would be nice if you have any solutions.


Solution

  • Supposing you want to do stuff in R, you need a grouping function. With for example group_by from dplyr you can do that:

    library(dplyr)
    dat %>% 
      group_by(Source, Target) %>% 
      mutate(tdif = lead(Time) - Time)
    

    the result:

       Source  Target                Time          tdif
       <fctr>  <fctr>              <dttm>        <time>
    1 1.2.3.4 2.3.4.5 2012-01-03 21:50:40 63158403 secs
    2 1.2.3.4 2.3.4.5 2014-01-03 21:50:43        5 secs
    3 1.2.3.4 2.3.4.5 2014-01-03 21:50:48       NA secs
    4 2.2.2.2 4.4.4.4 2014-01-03 21:50:51        4 secs
    5 2.2.2.2 4.4.4.4 2014-01-03 21:50:55        4 secs
    6 2.2.2.2 4.4.4.4 2014-01-03 21:50:59       NA secs
    

    Note that the first tdiff value is much larger than what you stated in the intended output. This is richt and due to the fact that the date of the first timestamp is from 2012, while the other timestamps are from 2014.