Search code examples
pandastimedelta

Get minimum time delta after grouping


I have something like this:

df_columns = {
    'firm_ID': [1, 1, 2, 2, 2],
    'date_incident' : ['2015-01-01', '2015-01-01', '2016-10-01', '2016-10-01', '2016-10-01'],
    'date_meeting' : ['2014-02-01', '2016-03-01', '2015-10-01', '2017-02-01', '2018-11-01'],
    }
simple_df = pd.DataFrame(df_columns)
simple_df['date_incident'] = pd.to_datetime(simple_df['date_incident'])
simple_df['date_meeting'] = pd.to_datetime(simple_df['date_meeting'])
simple_df['date_delta'] = simple_df['date_incident'] - simple_df['date_meeting']

There is only on date_incident per firm_ID, but several date_meetings per firm_ID. I want an additional column that returns the minimum date delta per firm_ID. Note that this delta can be negative as well.

enter image description here

So I get this (e.g., for firm_ID = 2 the closest meeting was -123 days prior):

enter image description here

Thanks.


Solution

  • Use DataFrameGroupBy.idxmin for rows with minimal absolute values of timedeltas converted to days and then create new column by mapping with Series.map:

    idx = simple_df['date_delta'].dt.days.abs().groupby(simple_df['firm_ID']).idxmin()
    df = simple_df.loc[idx]
    
    simple_df['new'] = simple_df['firm_ID'].map(df.set_index('firm_ID')['date_delta'])
    print (simple_df)
       firm_ID date_incident date_meeting date_delta       new
    0        1    2015-01-01   2014-02-01   334 days  334 days
    1        1    2015-01-01   2016-03-01  -425 days  334 days
    2        2    2016-10-01   2015-10-01   366 days -123 days
    3        2    2016-10-01   2017-02-01  -123 days -123 days
    4        2    2016-10-01   2018-11-01  -761 days -123 days