Search code examples
pythonpandasdata-miningenumeratedata-preprocessing

How to convert nominal data to numeric in python?


I am using a binary classification dataset. I want to convert nominal data to numeric. what should I do?

age | class
------------
 1 |  no
 2 |  yes
 3 |  no
 4 |  yes
 5 |  no
 6 |  no
 7 |  no
 8 |  yes
 9 |  no
10 |  y

Code:

mapping = {label:idx for idx,label in enumerate(np.unique(['class']))}
df['class'] = df['class'].map(mapping)

desired output : {'no':0 'yes':1}


Solution

  • the problem with your code is this:

    np.unique(['class'])
    

    You are trying to find the unique values of the list ['class'], which is only a single value, you should change it to:

    np.unique(df['class'])
    

    which has all the different values of your class column

    But before that, you should replace your noisy data y to yes:

    df['class'] = df['class'].replace('y', 'yes')
    

    The mapping variable now has your desired output:

    {'no':0 'yes':1}
    

    Complete code:

    import numpy as np 
    import pandas as pd
    
    df = pd.DataFrame(['no', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'y'],columns=['class'])
    
    df['class'] = df['class'].replace('y', 'yes') # replace your noisy data
    mapping = {label:idx for idx,label in enumerate(np.unique(df['class']))} # make your mapping dict
    df['class'] = df['class'].map(mapping) # map your class