python python-3.x machine-learning scikit-learn sklearn-pandas

Unpack error while using sklearn columntransfromer

I was trying to one hot encode a dataframe for some testing.

I tried using the regular OneHotEncoder from sklearn but It seemed to have some issues with NaN values (NaN values that were not present on columns I wanted to encode)

From what I searched, a solution was to use a column transformer, which could apply the encoding only to certain columns, something like the following

ct = ColumnTransformer([(OneHotEncoder(categories = categories_list),['col1','col2','col3'])])

In which categories_list is a list of all present categories.

The problem is that when I try to apply this transformer to my dataframe, I always get not enough values to unpack error.

Im transforming like this

ct.fit_transform(df_train_xgboost)

Any idea on what should I do?

EDIT:

Some example Data

id | col1 | col2 | col3 | price | has_something
1    blue   car    new    23781       NaN
2    green  truck  used   24512       1
3    red    van    new    44521       0

Some more code

categories_list = ['blue','green','red','car','truck','van','new','used']
df_train_xgboost = df_train
df_train_xgboost = df_train_xgboost.drop(columns_I_dont_want, axis=1)
df_train_xgboost = df_train_xgboost.fillna(value = {'col1': 0, 'col2': 0, 'col3': 0})

ct = ColumnTransformer([(OneHotEncoder(categories = categories_list),['col1','col2','col3'])])

print(df_train_xgboost.shape)
ct.fit_transform(df_train_xgboost)

Solution

First of all, the use of `ColumnTransformer` is not necessary.

To make your code work you need one more input argument i.e., the "name" of the transformer.

Full example:

df
    col1   col2  col3
0   blue    car   new
1  green  truck  used
2    red    van   new

ct = ColumnTransformer([("onehot",OneHotEncoder(),[0,1,2])])

ct.fit_transform(df.values)
array([[1., 0., 0., 1., 0., 0., 1., 0.],
       [0., 1., 0., 0., 1., 0., 0., 1.],
       [0., 0., 1., 0., 0., 1., 1., 0.]])

Now notice that you get the same output by only using OneHotEncoder:

o = OneHotEncoder()
o.fit_transform(df).toarray()

array([[1., 0., 0., 1., 0., 0., 1., 0.],
       [0., 1., 0., 0., 1., 0., 0., 1.],
       [0., 0., 1., 0., 0., 1., 1., 0.]])

Unpack error while using sklearn columntransfromer

First of all, the use of ColumnTransformer is not necessary.

First of all, the use of `ColumnTransformer` is not necessary.