I have a dataframe of orders received such as:
Date Units
01-01-2020 2
02-01-2020 12
03-01-2020 NaN
04-01-2020 NaN
05-01-2020 5
06-01-2020 9
07-01-2020 NaN
08-01-2020 1
Sometimes customers will order higher numbers so they don't have to reorder the next 1 or 2 days. I am trying to find the average number the customers are using on any given day. For example, if a customer orders 12 and then does not order for 2 days, all three days' units should be changed to 4. Output would be:
Date Units
01-01-2020 2
02-01-2020 4
03-01-2020 4
04-01-2020 4
05-01-2020 5
06-01-2020 4.5
07-01-2020 4.5
08-01-2020 1
I could do it with a combination of ffill and adding helper columns to count consecutive NaN values but I feel there must be a more elegant way.
We can try use isnull
with cumsum
create the groupby
key then do transform
df['Units'] = df['Units'].fillna(0).groupby(df['Units'].notnull().cumsum()).transform('mean')
0 2.0
1 4.0
2 4.0
3 4.0
4 5.0
5 4.5
6 4.5
7 1.0