I want to create groups of 3 consecutive records within another group that I already have in the dataset. The data will be sorted as per datetime and the already present group.
Also, I want to get the maximum datetime within the group of 3 records as described below.
Below example will clear out what I need -
Datetime | ID | Group of 3 | Max Time in group of 3 |
---|---|---|---|
2022-08-10 12:12:11 | 0 | 1 | 2022-08-10 12:13:11 |
2022-08-10 12:12:14 | 0 | 1 | 2022-08-10 12:13:11 |
2022-08-10 12:13:11 | 0 | 1 | 2022-08-10 12:13:11 |
2022-08-10 12:15:11 | 0 | 2 | 2022-08-10 12:15:11 |
2022-08-10 12:18:11 | 1 | 3 | 2022-08-10 12:21:11 |
2022-08-10 12:20:11 | 1 | 3 | 2022-08-10 12:21:11 |
2022-08-10 12:21:11 | 1 | 3 | 2022-08-10 12:21:11 |
2022-08-10 12:24:11 | 2 | 4 | 2022-08-10 12:29:11 |
2022-08-10 12:29:11 | 2 | 4 | 2022-08-10 12:29:11 |
2022-08-10 12:34:11 | 3 | 5 | 2022-08-10 12:34:11 |
Any SQL or Pandas solution will be appreciated. I tried using SQL window functions but couldn't get the logic right.
Here's a SQL solution using row_number()
and integer-math to group by threes and choosing the max value for each group.
select Datetime
,ID
,max(Datetime) over(partition by id, grp3) as mx_grp3
from (
select *
,(row_number() over(partition by id order by Datetime)-1)/3 as grp3
from t
) t
Datetime | ID | mx_grp3 |
---|---|---|
2022-08-10 12:12:11.000 | 0 | 2022-08-10 12:13:11.000 |
2022-08-10 12:12:14.000 | 0 | 2022-08-10 12:13:11.000 |
2022-08-10 12:13:11.000 | 0 | 2022-08-10 12:13:11.000 |
2022-08-10 12:15:11.000 | 0 | 2022-08-10 12:15:11.000 |
2022-08-10 12:18:11.000 | 1 | 2022-08-10 12:21:11.000 |
2022-08-10 12:20:11.000 | 1 | 2022-08-10 12:21:11.000 |
2022-08-10 12:21:11.000 | 1 | 2022-08-10 12:21:11.000 |
2022-08-10 12:24:11.000 | 2 | 2022-08-10 12:29:11.000 |
2022-08-10 12:29:11.000 | 2 | 2022-08-10 12:29:11.000 |
2022-08-10 12:34:11.000 | 3 | 2022-08-10 12:34:11.000 |