Consider I have the following dataframe:
Survived Pclass Sex Age Fare
0 0 3 male 22.0 7.2500
1 1 1 female 38.0 71.2833
2 1 3 female 26.0 7.9250
3 1 1 female 35.0 53.1000
4 0 3 male 35.0 8.0500
I used the get_dummies() function to create dummy variable. The code and output are as follows:
one_hot = pd.get_dummies(dataset, columns = ['Category'])
This will return:
Survived Pclass Age Fare Sex_female Sex_male
0 0 3 22 7.2500 0 1
1 1 1 38 71.2833 1 0
2 1 3 26 7.9250 1 0
3 1 1 35 53.1000 1 0
4 0 3 35 8.0500 0 1
What I would like to have is a single column for Sex having the values 0 or 1 instead of 2 columns.
Interestingly, when I used get_dummies() on a different dataframe, it worked just like I wanted.
For the following dataframe:
Category Message
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup final...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
With the code:
one_hot = pd.get_dummies(dataset, columns = ['Category'])
It returns:
Message ... Category_spam
0 Go until jurong point, crazy.. Available only ... ... 0
1 Ok lar... Joking wif u oni... ... 0
2 Free entry in 2 a wkly comp to win FA Cup fina... ... 1
3 U dun say so early hor... U c already then say... ... 0
4 Nah I don't think he goes to usf, he lives aro... ... 0
Here are multiple ways you can do:
from sklearn.preprocessing import LabelEncoder
lbl=LabelEncoder()
df['Sex_encoded'] = lbl.fit_transform(df['Sex'])
# using only pandas
df['Sex_encoded'] = df['Sex'].map({'male': 0, 'female': 1})
Survived Pclass Sex Age Fare Sex_encoded
0 0 3 male 22.0 7.2500 0
1 1 1 female 38.0 71.2833 1
2 1 3 female 26.0 7.9250 1
3 1 1 female 35.0 53.1000 1
4 0 3 male 35.0 8.0500 0