Search code examples
pythonpandasmissing-data

Missing value Imputation based on regression in pandas


i want to inpute the missing data based on multivariate imputation, in the below-attached data sets, column A has some missing values, and Column A and Column B have the correlation factor of 0.70. So I want to use a regression kind of realationship so that it will build the relation between Column A and Column B and impute the missing values in Python.

N.B.: I can do it using Mean, median, and mode, but I want to use the relationship from another column to fill the missing value.

How to deal the problem. your solution, please

import pandas as pd
from sklearn.preprocessing import Imputer
import numpy as np
  

    # assign data of lists.  
    data = {'Date': ['9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14', '9/21/14'], 
            'A': [77.13, 39.58, 33.70, np.nan, np.nan,39.66, 64.625, 80.04, np.nan ,np.nan ,19.43, 54.375, 38.41],
            'B': [19.5, 21.61, 22.25, 25.05, 24.20, 23.55, 5.70, 2.675, 2.05,4.06, -0.80, 0.45, -0.90],
            'C':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c']}  
      
    # Create DataFrame  
    df = pd.DataFrame(data)  
    df["Date"]= pd.to_datetime(df["Date"]) 
    # Print the output.  
    print(df) 

Solution

  • Use:

    dfreg = df[df['A'].notna()]
    dfimp = df[df['A'].isna()]
    
    from sklearn.neural_network import MLPRegressor    
    regr = MLPRegressor(random_state=1, max_iter=200).fit(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
    regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
    
    regr.predict(dfimp['B'].values.reshape(-1, 1))
    

    Note that in the provided data correlation of the A and B columns are very low (less than .05). For replacing the imputed values with empty cells:

    s = df[df['A'].isna()]['A'].index
    df.loc[s, 'A'] = regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
    

    Output:

    enter image description here