I have the following program:
cat_feats = ['x', 'y', 'z', 'a', 'b',
'c', 'd', 'e']
onehot_encoder = OneHotEncoder(categories='auto')
# convert each categorical feature from integer
# to one-hot
for feature in cat_feats:
data[feature] = data[feature].array.reshape(len(data[feature]), 1)
data[feature] = onehot_encoder.fit_transform(data[feature])
I am having issues with this. I get:
'PandasArray' object has no attribute 'reshape'
The output of data.head() before using the encoder is this:
0 2 1 4 6 3 2 1 37
2 1 7 2 10 0 4 1 37
3 2 15 2 6 0 2 1 37
5 2 0 4 7 1 4 1 37
7 4 14 2 9 0 4 1 37
This output is of type DataFrame and contains only integers which I am trying to convert to one-hot. I have tried .array, .values, .array.reshape(-1, 1), but none of these things are working. I found that trying .values seemed to work in the first line of the for loop, but I got garbage from my one-hot conversion.
Please help.
These following informations might be helpful:
data[feature]
: pandas.Series
data[feature].values
: numpy.ndarray
reshape
a numpy.ndarray
but not a pandas.Series, so you need to use .values
to get a numpy.ndarray
numpy.ndarray
to data[feature]
, automatic type conversion occurs, so data[feature] = data[feature].values.reshape(-1, 1)
doesn't seem to do anything.fit_transform
takes an array-like(Need to be a 2D array, e.g. pandas.DataFrame
or numpy.ndarray
) object as argument because sklearn.preprocessing.OneHotEncoder
is designed to fit/transform multiple features at the same time, input pandas.Series
(1D array) will cause error.fit_transform
will return sparse matrix(or 2-d array), assign it to a pandas.Series
may cause a disaster.(Not Recommended) If you insist on processing one feature after another:
for feature in categorical_feats:
encoder = OneHotEncoder()
tmp_ohe_data = pd.DataFrame(
encoder.fit_transform(data[feature].values.reshape(-1, 1)).toarray(),
columns=encoder.get_feature_names(),
)
data = pd.concat([tmp_ohe_data, data], axis=1).drop([feature], axis=1)
I Recommended do encoding like this:
encoder = OneHotEncoder()
ohe_data = pd.DataFrame(
encoder.fit_transform(data[categorical_feats]).toarray(),
columns=encoder.get_feature_names(),
)
res = pd.concat([ohe_data, data], axis=1).drop(categorical_feats, axis=1)
pandas.get_dummies
is also a good choice.