I'm trying to compute the difference between rows of a column based upon the first and last date, by group. for example:
df = pd.DataFrame({'ID':["a","a","a","b","b","b"],
'start_yr':[2010,2013,2020,2009,2005,2019],
'amt':[10,40,30,50,60,100]})
should return
a 20 (30-10)
b 40 (100 - 60)
I've tried stuff like this, but i'm missing something for sure.
#df['diff'] = df.groupby('ID')['start_yr','amt'].transform(lambda x: (x.max()-x.min()))
df['diff'] = dThe f.groupby('ID')['start_yr'].transform(lambda x: (x.max()-x.min()))
Let's try two steps:
s = df.sort_values(['ID','start_yr']).groupby(['ID'])['amt'].agg(['first','last'])
output = s['last'] - s['first']
Output:
ID
a 20
b 40
dtype: int64