Search code examples
pythonmissing-datafillna

How to label missing values in python using loop


this is my Dataframe

from cmath import nan


student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', nan, nan],
                             'class':['H', 'W', 'S']})
student_card

it looks like this

enter image description here

so there are two NaN values in 'name' columns, and I want to fill them as 'missing1', 'missing2' using loop (idk not using loop but have no idea how to index them without loop)

so I made this function and got stuck over here. It doesn't work please give me some helps, thanks

import pandas as pd

def fillna_func(df):
    df = df.copy()
    for i, value in enumerate(df.values):
        if value == nan:
            df[i].apply("deleted{}".format(i))
    return df


fillna_func(student_card['name'])

Solution

  • You could create a mask where the name is null, and filter the main dataframe by that. Then update those names using using the cumulative sum of the missing values.

    import numpy as np
    import pandas as pd
    student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                                 'name':['Kim', np.nan, np.nan],
                                 'class':['H', 'W', 'S']})
    
    def fillna_func(df):
        m = df.name.isnull()
        df.loc[m, 'name'] = 'missing' + m.cumsum().astype(str)
        return df
    
    fillna_func(student_card)
    

    Output

             ID      name class
    0  20190103       Kim     H
    1  20190222  missing1     W
    2  20190531  missing2     S