I am attempting to use MultinomialNB from sklearn to classify some data. I have made a sample csv with some labelled training data, which I want to use to train the model but I receive the following error message:
ValueError: Expected 2D array, got 1D array instead: array=[0 1 2 2].
I know it is a very small data set but I will eventually add more data once the code is working.
Here is my code:
import numpy as np
import pandas as pd
import array as array
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
data_file = pd.read_csv("CSV_Labels.csv", engine='python')
data_file.tail()
vectorizer = CountVectorizer(stop_words='english')
all_features = vectorizer.fit_transform(data_file.Word)
all_features.shape
x_train = data_file.label
y_train = data_file.Word
x_train.values.reshape(1, -1)
y_train.values.reshape(1, -1)
classifer = MultinomialNB()
classifer.fit(x_train, y_train)
Try this:
x_train = x_train.values.reshape(-1, 1)
y_train = y_train.values.reshape(-1, 1)
numpy
reshape operations are not inplace. So the array's you're passing to the classifier have actual the old shapes.