Selecting rows in pandas dataframe that contain same id and were created within 1 minute

I have a pandas dataframe that contains multiple rows with one identificator and the datetime of the row.

id	creation_date
MP5694319-001	2024-05-29 12:58:00.489
MP5694319-001	2024-05-29 12:58:03.489
MP5694331-001	2024-05-29 12:58:05.489
MP5694319-001	2024-05-29 13:03:03.489

From this dataframe I need to filter rows that contain both the same id and were created within 1 minute from each other. The final result:

id	creation_date
MP5694319-001	2024-05-29 12:58:00.489
MP5694319-001	2024-05-29 12:58:03.489

Dataframe contains multiple rows with various id, so I need to group them somehow and then apply the difference, but I have no idea how to do this.

Solution

You can use DataFrame.sort_values forst, get difference per groups in both directions by DataFrameGroupBy.diff and filter by Series.le for less or equal in boolean indexing:

df['creation_date'] = pd.to_datetime(df['creation_date'])

df = df.sort_values(['id','creation_date'])

g = df.groupby('id')['creation_date']
out = df[g.diff().le('1 min') | g.diff(-1).abs().le('1 min')]
print (out)
              id           creation_date
0  MP5694319-001 2024-05-29 12:58:00.489
1  MP5694319-001 2024-05-29 12:58:03.489

Another idea is back filling first value of difference per groups for include in output:

df['creation_date'] = pd.to_datetime(df['creation_date'])

df = df.sort_values(['id','creation_date'])

g = df.groupby('id')['creation_date']
out = df[g.transform(lambda x: x.diff().bfill()).le('1 min')]
print (out)
              id           creation_date
0  MP5694319-001 2024-05-29 12:58:00.489
1  MP5694319-001 2024-05-29 12:58:03.489