Search code examples
pandasdataframeloopsnullfillna

A loop to fill null values in a column using the mode value is breaking. Still no working Solution


The following is a sample from my data frame:

import pandas as pd
import numpy as np

d=['SKODASUPERB','SKODASUPERB',\
   'SKODASUPERB','MERCEDES-BENZE CLASS','ASTON MARTINVIRAGE'\
       ,'ASTON MARTINVIRAGE','ASTON MARTINVIRAGE','TOYOTAHIACE',\
           'TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS',\
               'FERRARI360','FERRARILAFERRARI']
x=['SUV','SUV','nan','nan','SPORTS','SPORTS','SPORTS',\
   'nan','SEDAN','SEDAN','SEDAN','SEDAN','SPORT','SPORT'] 



df=pd.DataFrame({'make_model':d,'body':x})
df.body=df.body.replace('nan',np.NaN)
df.head()

Out[24]: 
             make_model    body
0           SKODASUPERB     SUV
1           SKODASUPERB     SUV
2           SKODASUPERB     NaN
3  MERCEDES-BENZE CLASS     NaN
4    ASTON MARTINVIRAGE  SPORTS

There are some null values in the 'body' column

df.body.isnull().sum()
Out[25]: 3

So i am trying to fill the null values in body column by using the mode of body type for a particular make_model. For instance, 2 observations of SKODASUPERB have body as 'SUV' and 1 observation has body as null. So the mode of body for SKODASUPERB would be 'SUV' and i want 'SUV to be filled in for the third observation too. For this i am using the following code

make_model_list=df.make_model.unique().tolist()

for x in make_model_list:
    df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']=\
        df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']\
            .fillna(df.loc[df['make_model']==x,'body'].mode())

Unfortunately, the loop is breaking as some observation dont have a mode value

df.body.isnull().sum()
Out[30]: 3

How can i force the loop to run even if there is no mode 'body' value for a particular make_model. I know that i can use continue command, but i am not sure how to write it.


Solution

  • Finally, I have worked out a solution. It was just a matter of putting try and exception. This solution works perfectly for the purpose of my project and has filled 95% of the missing values. I have slightly changed the data to show that this method is effective:

    d=['SKODASUPERB','SKODASUPERB',\
       'SKODASUPERB','MERCEDES-BENZE CLASS','ASTON MARTINVIRAGE'\
           ,'ASTON MARTINVIRAGE','ASTON MARTINVIRAGE','TOYOTAHIACE',\
               'TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS','TOYOTAAVENSIS',\
                   'FERRARI360','FERRARILAFERRARI']
    x=['SUV','SUV','nan','nan','SPORTS','SPORTS','nan',\
       'nan','SEDAN','SEDAN','nan','SEDAN','SPORT','SPORT'] 
    df=pd.DataFrame({'make_model':d,'body':x})
    df.body=df.body.replace('nan',np.NaN)
    
    df
    Out[6]: 
                  make_model    body
    0            SKODASUPERB     SUV
    1            SKODASUPERB     SUV
    2            SKODASUPERB     NaN
    3   MERCEDES-BENZE CLASS     NaN
    4     ASTON MARTINVIRAGE  SPORTS
    5     ASTON MARTINVIRAGE  SPORTS
    6     ASTON MARTINVIRAGE     NaN
    7            TOYOTAHIACE     NaN
    8          TOYOTAAVENSIS   SEDAN
    9          TOYOTAAVENSIS   SEDAN
    10         TOYOTAAVENSIS     NaN
    11         TOYOTAAVENSIS   SEDAN
    
    df.body.isnull().sum()
    Out[7]: 5
    

    My Solution

    for x in make_model_list:
        try:
            df.loc[(df['make_model']==x)&(df['body'].isnull()),'body']=\
    df.loc[(df['make_model']==x)&(df['body'].isnull()),'body'].fillna\
    (df.loc[df['make_model']==x,'body'].value_counts().index[0])
        except:
            pass
    df.body.isnull().sum()
    Out[9]: 2 #null values have dropped from 5 to 2.
    

    Those 2 null values couldn't be filled because there was no frequent or mode value for them at all.