Search code examples
rdata.tablelag

How to follow group by time


I have a data frame as below:

user   time
____   ____
 1      2017-09-01 00:01:01
 1      2017-09-01 00:01:20
 1      2017-09-01 00:03:01
 1      2017-09-01 00:10:01
 1      2017-09-01 00:11:01
 2      2017-09-01 00:01:03
 2      2017-09-01 00:01:08
 2      2017-09-01 00:03:01   

From this data frame I want to create the follow group for each user like below:

user           time                      follow_group
____   ____________________             _____________                            
 1      2017-09-01 00:01:01                 1
 1      2017-09-01 00:01:20                 1  
 1      2017-09-01 00:03:01                 1 
 1      2017-09-01 00:10:01                 2
 1      2017-09-01 00:11:01                 2
 2      2017-09-01 00:01:03                 1
 2      2017-09-01 00:01:08                 1
 2      2017-09-01 00:03:01                 1

The follow group for each user changes when the time difference is greater than 5 minutes.

I tried by taking the lag and subtracting:

data[, previous_request_time:=c(NA, time[-.N]), by=user]

But that didn't seem to work. Any help is appreciated.


Solution

  • Just do a difftime operation and check if the difference is greater than 5 mins. Then a cumulative sum will give your group counter:

    dat[,
      follow_group := cumsum(difftime(time, shift(time, fill=-Inf), units="mins") > 5),
      by=user
    ]
    
    #   user                time follow_group
    #1:    1 2017-09-01 00:01:01            1
    #2:    1 2017-09-01 00:01:20            1
    #3:    1 2017-09-01 00:03:01            1
    #4:    1 2017-09-01 00:10:01            2
    #5:    1 2017-09-01 00:11:01            2
    #6:    2 2017-09-01 00:01:03            1
    #7:    2 2017-09-01 00:01:08            1
    #8:    2 2017-09-01 00:03:01            1
    

    You could similarly use just diff if you didn't want to be too explicit about the units:

    dat[, flwgrp := cumsum(c(Inf, diff(time)) > 5*60), by=user]