I have the following df
text count daytime label
I think... 4 morning pos
You should... 3 afternoon neg
Better... 7 evening neu
I tried to only preprocess the text
column using ColumnTransform by using
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([
('vectorizer', TfidfVectorizer(ngram_range=(1, 1)), 'text')
], remainder='passthrough')
It worked fine. Then I want to apply count
and daytime
respectively by using the following code
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
transformer = ColumnTransformer([
('vectorizer', TfidfVectorizer(ngram_range=(1, 1)), 'text'),
('scaler', StandardScaler(), 'count'),
('enc', OneHotEncoder(), 'daytime')
], remainder='passthrough')
X_transformed = transformer.fit_transform(X)
It gave me error
1D data passed to a transformer that expects 2D data. Try to specify the column selection as a list of one item instead of a scalar.
I think the problem is with standardscaler, where it only passes 1D. How do I solve this?
You have to separate each tuple in the list of tuples with commas. Since StandardScaler
and OneHotEncoder
expect 2D inputs, you should, as the error message suggests, pass the column selectors as a list of one item for these transformers:
transformer = ColumnTransformer([
('vectorizer', TfidfVectorizer(ngram_range=(1, 1)), 'text'),
('scaler', StandardScaler(), ['count']),
('enc', OneHotEncoder(), ['daytime'])
], remainder='passthrough')