I am using a binary classification dataset. I want to convert nominal data to numeric. what should I do?
age | class
------------
1 | no
2 | yes
3 | no
4 | yes
5 | no
6 | no
7 | no
8 | yes
9 | no
10 | y
Code:
mapping = {label:idx for idx,label in enumerate(np.unique(['class']))}
df['class'] = df['class'].map(mapping)
desired output : {'no':0 'yes':1}
the problem with your code is this:
np.unique(['class'])
You are trying to find the unique values of the list ['class']
, which is only a single value, you should change it to:
np.unique(df['class'])
which has all the different values of your class
column
But before that, you should replace your noisy data y
to yes
:
df['class'] = df['class'].replace('y', 'yes')
The mapping
variable now has your desired output:
{'no':0 'yes':1}
Complete code:
import numpy as np
import pandas as pd
df = pd.DataFrame(['no', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'y'],columns=['class'])
df['class'] = df['class'].replace('y', 'yes') # replace your noisy data
mapping = {label:idx for idx,label in enumerate(np.unique(df['class']))} # make your mapping dict
df['class'] = df['class'].map(mapping) # map your class