Python Pandas imputation of Null values

I am attempting to impute Null values with an offset that corresponds to the average of the row df[row,'avg'] and average of the column ('impute[col]'). Is there a way to do this that would make the method parallelize with .map? Or is there a better way to iterate through the indexes containing Null values?

test = pd.DataFrame({'a':[None,2,3,1], 'b':[2,np.nan,4,2], 
                     'c':[3,4,np.nan,3], 'avg':[2.5,3,3.5,2]});  
df = df[['a', 'b', 'c', 'avg']];
impute = dict({'a':2, 'b':3.33, 'c':6 } )  

def smarterImpute(df, impute):  
    df2 = df
    for col in df.columns[:-1]:
        for row in test.index:  
            if pd.isnull(df.loc[row,col]):  
                df2.loc[row, col] = impute[col] 
                                    + (df.loc[:,'avg'].mean() - df.loc[row,'avg'] )

return print(df2)  

smarterImpute(test, impute)

Solution

Notice that in your 'filling' expression:

impute[col] + (df.loc[:,'avg'].mean() - df.loc[row,'avg']`

The first term only depends on the column and the third only on the row; the second is just a constant. So we can create an imputation dataframe to look up whenever there's a value that needs to be filled:

impute_df = pd.DataFrame(impute, index = test.index).add(test.avg.mean() - test.avg, axis = 0)

Then, there's a method in called .combine_first() that allows you fill the NAs in one dataframe with the values of another, which is exactly what we need. We use this, and we're done:

test.combine_first(impute_df)

With pandas, you generally want to avoid using loops, and seek to make use of vectorization.