Search code examples
pythonpandasmeanfill

Distribute value across following NaNs using ffill in pandas


I have a dataframe of orders received such as:

 Date        Units
01-01-2020  2
02-01-2020  12
03-01-2020  NaN
04-01-2020  NaN
05-01-2020  5
06-01-2020  9
07-01-2020  NaN
08-01-2020  1

Sometimes customers will order higher numbers so they don't have to reorder the next 1 or 2 days. I am trying to find the average number the customers are using on any given day. For example, if a customer orders 12 and then does not order for 2 days, all three days' units should be changed to 4. Output would be:

 Date        Units
01-01-2020  2
02-01-2020  4
03-01-2020  4
04-01-2020  4
05-01-2020  5
06-01-2020  4.5
07-01-2020  4.5
08-01-2020  1

I could do it with a combination of ffill and adding helper columns to count consecutive NaN values but I feel there must be a more elegant way.


Solution

  • We can try use isnull with cumsum create the groupby key then do transform

    df['Units'] = df['Units'].fillna(0).groupby(df['Units'].notnull().cumsum()).transform('mean')
    0    2.0
    1    4.0
    2    4.0
    3    4.0
    4    5.0
    5    4.5
    6    4.5
    7    1.0