I'm trying to get a one-hot encoding of a single pandas dataframe column. Here's what I've got:
OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)
OH_cols_train = pd.DataFrame(OH_encoder.fit_transform(X_train['time_of_day']))
When running this, I get a pretty big error stack, which can be summarized by the following:
ValueError: Expected 2D array, got 1D array instead:
I can't seem to figure it out.
Here is some sample data:
X_train = pd.DataFrame({'ID': ['1234', '5678', '5678', '1234'],
'time_of_day': ['Morning', 'Afternoon', 'Evening', 'Morning']})
Any help is appreciated!
You are not passing a Dataframe, but a Serie.
type(X_train['time_of_day'])
pandas.core.series.Series
You can use X_train[['time_of_day']] (with [[ ]]):
type(X_train[['time_of_day']])
pandas.core.frame.DataFrame
Like this
OH_cols_train = pd.DataFrame(OH_encoder.fit_transform(X_train[['time_of_day']]))