I have a pandas DataFrame that I want to group by the 'id' column then add an extra row to the bottom of each group where the date for that row is one business day forward from the last row of the group.
df = pd.DataFrame(data={'d': [datetime.date(2010,12,30), datetime.date(2010,12,31), datetime.date(2010,12,30),datetime.date(2010,12,31)], 'id': [1,1,2,2], 'val': [10,200, 90,420]})
I have:
Date id val
0 2010-12-30 1 10
1 2010-12-31 1 200
2 2010-12-30 2 90
3 2010-12-31 2 420
I want:
Date id val
0 2010-12-30 1 10
1 2010-12-31 1 200
2 2011-01-01 1 NaN
3 2010-12-30 2 90
4 2010-12-31 2 420
5 2011-01-01 2 Nan
The solution shown here from zero seems like it should work:
Pandas: add row to each group depending on condition
I have tried adapting it to my situation but just can't make it work:
def add_row(x):
from pandas.tseries.offsets import BDay
last_row = x.iloc[-1]
last_row['Date'] = x.Date + BDay(1)
return x.append(last_row)
df.groupby('id').apply(add_row)
AttributeError: 'DataFrame' object has no attribute 'Date'
I don't just want to resolve this particular error message I want a solution to the problem.
I would use:
df['d'] = pd.to_datetime(df['d'])
out = pd.concat([df,
(df.loc[df.groupby('id')['d'].idxmax(), ['d', 'id']]
.assign(d=lambda x: x['d'].add(pd.DateOffset(days=1)))
)
]
).sort_index(kind='stable', ignore_index=True)
NB. for business days replace pd.DateOffset(days=1)
by pd.offsets.BusinessDay(1)
.
Output:
d id val
0 2010-12-30 1 10.0
1 2010-12-31 1 200.0
2 2011-01-01 1 NaN
3 2010-12-30 2 90.0
4 2010-12-31 2 420.0
5 2011-01-01 2 NaN
Alternative:
last = df.groupby('id')['d'].idxmax()
out = df.loc[df.index.repeat(df.index.isin(last)+1)]
m = out.index.duplicated()
out.loc[m, 'd'] += pd.DateOffset(days=1)
out.loc[m, 'val'] = float('nan')