Search code examples
pythonpython-3.xtime-seriesforecastingfillna

need to fill the NA values with the past three values before na values in python


need to fill the NA values with the past three values mean of that NA

this is my dataset

RECEIPT_MONTH_YEAR NET_SALES

0 2014-01-01 818817.20

1 2014-02-01 362377.20

2 2014-03-01 374644.60

3 2014-04-01 NA

4 2014-05-01 NA

5 2014-06-01 NA

6 2014-07-01 NA

7 2014-08-01 46382.50

8 2014-09-01 55933.70

9 2014-10-01 292303.40

10 2014-10-01 382928.60


Solution

  • is this dataset a .csv file or a dataframe. This NA is a 'NaN' or a string ?

    import pandas as pd
    import numpy as np
    df=pd.read_csv('your dataset',sep=' ')
    df.replace('NA',np.nan)
    df.fillna(method='ffill',inplace=True) 
    

    you mention something about mean of 3 values..the above simply forward fills the last observation before the NaNs begin. This is often a good way for forecasting (better than taking means in certain cases, if persistence is important)

     ind = df['NET_SALES'].index[df['NET_SALES'].apply(np.isnan)]
     Meanof3 = df.iloc[ind[0]-3:ind[0]].mean(axis=1,skipna=True)
     df.replace('NA',Meanof3)
    

    Maybe the answer can be generalised and improved if more info about the dataset is known - like if you always want to take the mean of last 3 measurements before any NA. The above will allow you to check the indices that are NaNs and then take mean of 3 before, while ignoring any NaNs