I'm trying to train a model with Sklearn. In short, I have a Pandas Dataframe with two columns, the 'review' where I have the input (text format) and the 'sentiment' column, but I having trouble converting text input to numeric format with TfidfVectorizer of Sklearn.
With the following code:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
train_x_vector = tfidf.fit_transform(train_x)
test_x_vector = tfidf.transform(test_x)
from sklearn.svm import SVC
svc = SVC(kernel='linear')
svc.fit(train_x_vector, train_y)
I get the following error:
I have the suspicion that the problem is in converting the input to numeric data:
Any suggestion to solve it?
Thanks in advance!
TfidfVectorizer expects a list or an array of strings as input. In the code, train_x is a DataFrame and not a list or an array of strings.
Solution:
train_x_vector = tfidf.fit_transform(train_x['review'].values)