I want to build a naive bayes model using two dataframes (test dataframe, train dataframe)
The dataframe contains 13 columns, but I just want to encode the dataframe from str
to int
value in just 5-6 columns. How can I do that with one code so that 6 columns can directly be encoded, I follow this answer:
https://stackoverflow.com/a/37159615/12977554
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame({
'colors': ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
'skills': ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
})
def encode_df(dataframe):
le = LabelEncoder()
for column in dataframe.columns:
dataframe[column] = le.fit_transform(dataframe[column])
return dataframe
#encode the dataframe
encode_df(df)
but it just only encodes 1 column, instead what I want is 6 columns with 1 code.
You can loop through the columns and fit_transform
cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6']
for col in cols:
le = LabelEncoder()
df[col] = le.fit_transform(df[col].astype('str'))
df
Ideally you want to use same trasnfomer for both train and test dataset
For that you need to use
for col in cols:
le = LabelEncoder()
le.fit(df_train[col].astype('str'))
df_train[col] = le.transform(df_train[col].astype('str'))
df_test[col] = le.transform(df_test[col].astype('str'))
df