Search code examples
timepandas-groupbydifference

pandas: time difference in groupby


How to calculate time difference for each id between current row and next for dataset below:

time                    id

2012-03-16 23:50:00      1
2012-03-16 23:56:00      1
2012-03-17 00:08:00      1
2012-03-17 00:10:00      2
2012-03-17 00:12:00      2
2012-03-17 00:20:00      2
2012-03-20 00:43:00      3

and get next result:

time                    id       tdiff
2012-03-16 23:50:00      1         6  
2012-03-16 23:56:00      1         12  
2012-03-17 00:08:00      1         NA
2012-03-17 00:10:00      2         2 
2012-03-17 00:12:00      2         8    
2012-03-17 00:20:00      2         NA 
2012-03-20 00:43:00      3         NA

Solution

  • I see that you need result in minutes by id. Here is how to do it :

    use diff() in groupby :

    # first convert to datetime with the right format 
    data['time']=pd.to_datetime(data.time, format='%Y-%m-%d %H:%M:%S')
    data['tdiff']=(data.groupby('id').diff().time.values/60000000000).astype(int)
    data['tdiff'][data['tdiff'] < 0] = np.nan
    print(data)
    

    output

                     time  id  tdiff
    0 2012-03-16 23:50:00   1    NaN
    1 2012-03-16 23:56:00   1    6.0
    2 2012-03-17 00:08:00   1   12.0
    3 2012-03-17 00:10:00   2    NaN
    4 2012-03-17 00:12:00   2    2.0
    5 2012-03-17 00:20:00   2    8.0
    6 2012-03-20 00:43:00   3    NaN