I am reading columns in .csv files as inputs to a sklearn Naive Bayes fit. However, I am running into these errors and warnings:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
and
ValueError: Found arrays with inconsistent numbers of samples: [ 1 10509]
And here is my code:
clf = GaussianNB()
columns = defaultdict(list)
with open('file.CSV', 'rb') as f:
reader = csv.reader(f)
for row in reader:
for(i, v) in enumerate(row):
columns[i].append(v)
clf.fit(columns[9], columns[10])
As a note, len(columns[9]) and len(columns[10]) are both 10509
As the warning suggested, I tried a lot of different combinations of reshape(), flatten(), ravel(), and also tried to use a numpy arrays, but nothing seems to be working.
Any suggestions? It seems that most people are using some kind of data structure other than a defaultdict, but I'm not sure about how to use other data structures to read from a .csv
I found the solution to my problem. Seems like the issue wasn't about shaping the data structure, but with setting it to be a number type rather than a string type.
x = np.array(columns[9]).reshape(len(columns[10]), 1).astype(np.float)
y = np.array(columns[10])
clf.fit(x, y)