I am using a ColumnTransformer to create a pipeline of two transformers - one that converts time column to multiple features like day, month, week etc. This is followed by a OHE transformer to encode the categorical columns.
I am using the code below:
time_col = ['visitStartTime']
class TimeTransformer:
def fit(self, X, y):
return self
def transform(self, X):
for column in X.columns:
X['time'] = pd.to_datetime(X[column], unit = 's', origin = 'unix')
X['day_of_week'] = pd.to_datetime(X['time']).dt.strftime('%A')
X['hour'] = pd.to_datetime(X['time']).dt.hour
X['day'] = pd.to_datetime(X['time']).dt.day
X['month'] = pd.to_datetime(X['time']).dt.month
X['year'] = pd.to_datetime(X['time']).dt.year
X = X.drop(['time'], axis = 1)
return X
#Transformer to handle visitStartTime
time_transformer = Pipeline(steps =[
('time', TimeTransformer())
])
#Transformer to encode categorical features
ohe_transformer = Pipeline(steps = [
('ohe', OneHotEncoder())
])
from sklearn.compose import make_column_selector as selector
#Combined transfomrer
preprocessor = ColumnTransformer(transformers = [
('date', time_transformer, time_col ),
('ohe',ohe_transformer, selector(dtype_include = 'object'))
],remainder = 'passthrough', sparse_threshold = 0)
j = preprocessor.fit_transform(X_train)
When i check the output of j, i see that the categorical columns which were created as a result of time_transformer has not been converted.
How to correct this?
OneHotEncoder has categories='auto'
as default setting, which means it tries to detect the columns that need to be converted automatically.
There are two things you can do:
str
or better categorical
: df[col] = df[col].astype('category')
OneHotEncoder(categories=['col1', 'col2', ...])