python pandas data-mining enumerate data-preprocessing

How to convert nominal data to numeric in python?

I am using a binary classification dataset. I want to convert nominal data to numeric. what should I do?

age | class
------------
 1 |  no
 2 |  yes
 3 |  no
 4 |  yes
 5 |  no
 6 |  no
 7 |  no
 8 |  yes
 9 |  no
10 |  y

Code:

mapping = {label:idx for idx,label in enumerate(np.unique(['class']))}
df['class'] = df['class'].map(mapping)

desired output : {'no':0 'yes':1}

Solution

the problem with your code is this:

np.unique(['class'])

You are trying to find the unique values of the list ['class'], which is only a single value, you should change it to:

np.unique(df['class'])

which has all the different values of your class column

But before that, you should replace your noisy data y to yes:

df['class'] = df['class'].replace('y', 'yes')

The mapping variable now has your desired output:

{'no':0 'yes':1}

Complete code:

import numpy as np 
import pandas as pd

df = pd.DataFrame(['no', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'y'],columns=['class'])

df['class'] = df['class'].replace('y', 'yes') # replace your noisy data
mapping = {label:idx for idx,label in enumerate(np.unique(df['class']))} # make your mapping dict
df['class'] = df['class'].map(mapping) # map your class