Search code examples
pythonpandasdatepandas-groupbysubtraction

how to use pandas to subtract rows of a column based upon data by group?


I'm trying to compute the difference between rows of a column based upon the first and last date, by group. for example:

df = pd.DataFrame({'ID':["a","a","a","b","b","b"],
                   'start_yr':[2010,2013,2020,2009,2005,2019],
                   'amt':[10,40,30,50,60,100]})

should return

a 20   (30-10)
b 40   (100 - 60)

I've tried stuff like this, but i'm missing something for sure.

#df['diff'] = df.groupby('ID')['start_yr','amt'].transform(lambda x: (x.max()-x.min()))
df['diff'] = dThe f.groupby('ID')['start_yr'].transform(lambda x: (x.max()-x.min()))

Solution

  • Let's try two steps:

    s = df.sort_values(['ID','start_yr']).groupby(['ID'])['amt'].agg(['first','last'])
    output = s['last'] - s['first']
    

    Output:

    ID
    a    20
    b    40
    dtype: int64