Search code examples
pythonnlp

TypeError: only size-1 arrays can be converted to Python scalars when computing f1_score


I am a beginner of learning nlp, and I am trying to classify a dataset with GaussianNB() and evaluate by f1_score. I got this TypeError when calling the f1_score function and here is my code:

dev_X_train, dev_X_test, dev_y_train, dev_y_test = train_test_split(dev_X, dev_y, test_size = 0.2, random_state =0)
classifier = GaussianNB()

dev_y_train = dev_y_train.astype(numpy.int)
dev_y_test = dev_y_test.astype(numpy.int)

classifier.fit(dev_X_train, dev_y_train)
dev_y_pred = classifier.predict(dev_X_test)

dev_y_pred = dev_y_pred.astype(numpy.int)

score = f1_score(dev_y_test, dev_y_pred, pos_label=1)
print('F1 Score: %.3f' % dev_y_pred)

and this is what the training and testing data look like.

 dev_X_train:
 <class 'numpy.ndarray'> len=80 
 [[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]
dev_y_train:
 <class 'numpy.ndarray'> len=80 
 [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0 1
 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0
 1 0 0 1 0 1]
dev_X_test:
 <class 'numpy.ndarray'> len=20 
 [[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]]

dev_y_test:
 <class 'numpy.ndarray'> len=20 [1 1 1 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0]

dev_y_pred:
 <class 'numpy.ndarray'> len=20 [1 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0]

I have tried .astype(numpy.int) as others said, but it still has the same outcome. So, could you please explain why this happens and how to fix it?

here is the full Traceback:

Traceback (most recent call last):
  File "/Users/chenchiyu/Desktop/COMP90042 NLP/Project/proj.py", line 241, in <module>
    print('F1 Score: %.3f' % dev_y_pred)
TypeError: only size-1 arrays can be converted to Python scalars

Solution

  • Did you mean to format your print string with the score variable instead? The error is with your print call, not the f1_score call, as seen from the stack trace. You're receiving this error because you used a format specifier for a single float and you're trying to insert an entire array (dev_y_pred) rather than a single scalar value. Maybe you meant to do this: print('F1 Score: %.3f' % score)