Search code examples
pythonpandasdataframescikit-learnscikits

Filling NaN values with binary digits


I have a some data and in the column "sex" it is listed as Male or Female, when this data is translated onto Google Colab it conveys all of the data as NaN in the row "sex".

I was wondering if there was a way that I can get this data to represent 0 for Male and 1 for Female. I have tried using the replace function, however I keep getting the same error as shown in the image.

Code/Error:

Code/Error

Data:

Data


Solution

  • Just to reproduce the sample data as yours and explained in way forward to parse it to get the desired outcome:

    #!/home/Karn_python3/bin/python
    from __future__ import (absolute_import, division, print_function)
    import pandas as pd
    import numpy as np
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', None)
    pd.set_option('max_colwidth', None)
    pd.set_option('expand_frame_repr', False)
    
    
    # Read CSV and create dataframe.
    df = pd.read_csv('adult_test.csv')
    
    # It appears as your column name might have spaces around it, so let's trim them first.
    # first to avoid any mapping/processing issues of data
    df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
    
    # Create a dictionary and map that to the desired column, which is easy and
    # faster than replace.
    m = {'Male': 0, 'Female': 1}
    
    # As there may be Nan values so, better to fill them with int values
    # whatever you like as used fillna & used 0 and convert the dtype to int
    # otherwise you will get it float.
    df['Sex'] = df['Sex'].map(m).fillna(0).astype(int)
    print(df.head(20))
    

    Resulted Output:

                         Age         Workclass    fnlwgt     Education  Education_Num      Martial_Status         Occupation   Relationship   Race  Sex  Capital_Gain  Capital_Loss  Hours_per_week        Country  Target
    0   |1x3 Cross validator               NaN       NaN           NaN            NaN                 NaN                NaN            NaN    NaN    0           NaN           NaN             NaN            NaN     NaN
    1                     25           Private  226802.0          11th            7.0       Never-married  Machine-op-inspct      Own-child  Black    0           0.0           0.0            40.0  United-States  <=50K.
    2                     38           Private   89814.0       HS-grad            9.0  Married-civ-spouse    Farming-fishing        Husband  White    0           0.0           0.0            50.0  United-States  <=50K.
    3                     28         Local-gov  336951.0    Assoc-acdm           12.0  Married-civ-spouse    Protective-serv        Husband  White    0           0.0           0.0            40.0  United-States   >50K.
    4                     44           Private  160323.0  Some-college           10.0  Married-civ-spouse  Machine-op-inspct        Husband  Black    0        7688.0           0.0            40.0  United-States   >50K.
    5                     18               NaN  103497.0  Some-college           10.0       Never-married                NaN      Own-child  White    1           0.0           0.0            30.0  United-States  <=50K.
    6                     34           Private  198693.0          10th            6.0       Never-married      Other-service  Not-in-family  White    0           0.0           0.0            30.0  United-States  <=50K.
    7                     29               NaN  227026.0       HS-grad            9.0       Never-married                NaN      Unmarried  Black    0           0.0           0.0            40.0  United-States  <=50K.
    8                     63  Self-emp-not-inc  104626.0   Prof-school           15.0  Married-civ-spouse     Prof-specialty        Husband  White    0        3103.0           0.0            32.0  United-States   >50K.
    9                     24           Private  369667.0  Some-college           10.0       Never-married      Other-service      Unmarried  White    1           0.0           0.0            40.0  United-States  <=50K.
    10                    55           Private  104996.0       7th-8th            4.0  Married-civ-spouse       Craft-repair        Husband  White    0           0.0           0.0            10.0  United-States  <=50K.
    11                    65           Private  184454.0       HS-grad            9.0  Married-civ-spouse  Machine-op-inspct        Husband  White    0        6418.0           0.0            40.0  United-States   >50K.
    12                    36       Federal-gov  212465.0     Bachelors           13.0  Married-civ-spouse       Adm-clerical        Husband  White    0           0.0           0.0            40.0  United-States  <=50K.
    13                    26           Private   82091.0       HS-grad            9.0       Never-married       Adm-clerical  Not-in-family  White    1           0.0           0.0            39.0  United-States  <=50K.
    14                    58               NaN  299831.0       HS-grad            9.0  Married-civ-spouse                NaN        Husband  White    0           0.0           0.0            35.0  United-States  <=50K.
    15                    48           Private  279724.0       HS-grad            9.0  Married-civ-spouse  Machine-op-inspct        Husband  White    0        3103.0           0.0            48.0  United-States   >50K.
    16                    43           Private  346189.0       Masters           14.0  Married-civ-spouse    Exec-managerial        Husband  White    0           0.0           0.0            50.0  United-States   >50K.
    17                    20         State-gov  444554.0  Some-college           10.0       Never-married      Other-service      Own-child  White    0           0.0           0.0            25.0  United-States  <=50K.
    18                    43           Private  128354.0       HS-grad            9.0  Married-civ-spouse       Adm-clerical           Wife  White    1           0.0           0.0            30.0  United-States  <=50K.
    19                    37           Private   60548.0       HS-grad            9.0             Widowed  Machine-op-inspct      Unmarried  White    1           0.0           0.0            20.0  United-States  <=50K.
    

    Just to be better organized the Data:

    As we have Nan values as well, so better we incorporate them within the dict like m = {'Male': 0, 'Female': 1, np.nan: 0} so, we can map all of them altogether rather using fillna later.

    df = pd.read_csv('adult_test.csv')
    df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
    m = {'Male': 0, 'Female': 1, np.nan: 0}
    df['Sex'] = df['Sex'].map(m)
    print(df.head(20))
    

    Another Solution with replace:

    Just using replace while using the dict again ...

    df = pd.read_csv('adult_test.csv')
    df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
    m = {'Male': 0, 'Female': 1, np.nan: 0}
    df = df.replace({'Sex': m})
    print(df.head(20))
    

    Refer to @jpp's answer here Replace values in a pandas series via dictionary efficiently