Search code examples
pythonsqlpandaswindow-functions

Create groups of 3 records within ID column grouping


I want to create groups of 3 consecutive records within another group that I already have in the dataset. The data will be sorted as per datetime and the already present group.

Also, I want to get the maximum datetime within the group of 3 records as described below.

Below example will clear out what I need -

Datetime ID Group of 3 Max Time in group of 3
2022-08-10 12:12:11 0 1 2022-08-10 12:13:11
2022-08-10 12:12:14 0 1 2022-08-10 12:13:11
2022-08-10 12:13:11 0 1 2022-08-10 12:13:11
2022-08-10 12:15:11 0 2 2022-08-10 12:15:11
2022-08-10 12:18:11 1 3 2022-08-10 12:21:11
2022-08-10 12:20:11 1 3 2022-08-10 12:21:11
2022-08-10 12:21:11 1 3 2022-08-10 12:21:11
2022-08-10 12:24:11 2 4 2022-08-10 12:29:11
2022-08-10 12:29:11 2 4 2022-08-10 12:29:11
2022-08-10 12:34:11 3 5 2022-08-10 12:34:11

Any SQL or Pandas solution will be appreciated. I tried using SQL window functions but couldn't get the logic right.


Solution

  • Here's a SQL solution using row_number() and integer-math to group by threes and choosing the max value for each group.

    select  Datetime    
           ,ID
           ,max(Datetime) over(partition by id, grp3) as mx_grp3
    from   (
           select *
                  ,(row_number() over(partition by id order by Datetime)-1)/3 as grp3
           from   t 
           ) t
    
    Datetime ID mx_grp3
    2022-08-10 12:12:11.000 0 2022-08-10 12:13:11.000
    2022-08-10 12:12:14.000 0 2022-08-10 12:13:11.000
    2022-08-10 12:13:11.000 0 2022-08-10 12:13:11.000
    2022-08-10 12:15:11.000 0 2022-08-10 12:15:11.000
    2022-08-10 12:18:11.000 1 2022-08-10 12:21:11.000
    2022-08-10 12:20:11.000 1 2022-08-10 12:21:11.000
    2022-08-10 12:21:11.000 1 2022-08-10 12:21:11.000
    2022-08-10 12:24:11.000 2 2022-08-10 12:29:11.000
    2022-08-10 12:29:11.000 2 2022-08-10 12:29:11.000
    2022-08-10 12:34:11.000 3 2022-08-10 12:34:11.000

    Fiddle