I am trying to convert categorical value to integer using OneHotEncoder and ColumnTransformer. My understanding is it should create dummies for category columns like pd.get_dummies. My file is having ~1500 records and 10 columns.
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
cat_features=['COMPANY_NAME', 'BRAND_NAME']
enc=OneHotEncoder()
transformer = ColumnTransformer([("enc",
enc,
cat_features)],
remainder="passthrough")
df_transformed = transformer.fit_transform(df_model)
df_transformed)
The result is:
<1574x37 sparse matrix of type '<class 'numpy.float64'>'
with 15513 stored elements in Compressed Sparse Row format>
When I try to look at the data after converting it into dataframe using:
What is wrong I am doing. My data looks something like below:
You need to convert it to a dense array before putting it into a data.frame, see help page too:
pd.DataFrame(df_transformed.toarray())
Or you set the transformer to always return a dense array, see the sparse threshold option
transformer = ColumnTransformer([("enc",
enc,
cat_features)],
remainder="passthrough",sparse_threshold=0)