Search code examples
pythonpandasdataframedata-analysis

How to fill a particular value with mean value of the column between first row and the corresponding row in pandas dataframe


I have a df like this,

A   B   C   D   E
1   2   3   0   2
2   0   7   1   1
3   4   0   3   0
0   0   3   4   3

I am trying to replace all the 0 with mean() value between the first row and the 0 value row for the corresponding column,

My expected output is,

A       B       C           D       E
1.0     2.00    3.000000    0.0     2.0
2.0     1.00    7.000000    1.0     1.0
3.0     4.00    3.333333    3.0     1.0
1.5     1.75    3.000000    4.0     3.0

Solution

  • Here is main problem need previous mean value if multiple 0 per column, so realy problematic create vectorized solution:

    def f(x):
        for i, v in enumerate(x):
            if v == 0: 
                x.iloc[i] = x.iloc[:i+1].mean()
        return x
    
    df1 = df.astype(float).apply(f)
    print (df1)
    
         A     B         C    D    E
    0  1.0  2.00  3.000000  0.0  2.0
    1  2.0  1.00  7.000000  1.0  1.0
    2  3.0  4.00  3.333333  3.0  1.0
    3  1.5  1.75  3.000000  4.0  3.0
    

    Better solution:

    #create indices of zero values to helper DataFrame
    a, b = np.where(df.values == 0)
    df1 = pd.DataFrame({'rows':a, 'cols':b})
    #for first row is not necessary count means
    df1 = df1[df1['rows'] != 0]
    print (df1)
       rows  cols
    1     1     1
    2     2     2
    3     2     4
    4     3     0
    5     3     1
    
    #loop by each row of helper df and assign means
    for i in df1.itertuples():
        df.iloc[i.rows, i.cols] = df.iloc[:i.rows+1, i.cols].mean()
    
    print (df)
         A     B         C  D    E
    0  1.0  2.00  3.000000  0  2.0
    1  2.0  1.00  7.000000  1  1.0
    2  3.0  4.00  3.333333  3  1.0
    3  1.5  1.75  3.000000  4  3.0
    

    Another similar solution (with mean of all pairs):

    for i, j in zip(*np.where(df.values == 0)):
        df.iloc[i, j] = df.iloc[:i+1, j].mean()
    print (df)
    
         A     B         C    D    E
    0  1.0  2.00  3.000000  0.0  2.0
    1  2.0  1.00  7.000000  1.0  1.0
    2  3.0  4.00  3.333333  3.0  1.0
    3  1.5  1.75  3.000000  4.0  3.0