For most competitions, the data is split into a training data and test data. I worked on the training data(spliting it into x_train...) and created a model, i evaluated it and got the respective accuracy score. I used the same model to predict the test dataset that was left out, however, when i tried to evaluate the models performance, i kept getting this error below: Could anyone explain what am doing wrong? and give ways to remedy it.
The Code:
# logistic regression
logreg_main_test = logreg.predict(main_test_scaled) # predict
# evaluate
logreg_score_main_test = accuracy_score(Y_test, logreg_main_test)
f1_val_main_test = f1_score(Y_test, logreg_score_main_test)
recall_val_main_test = recall_score(Y_test, logreg_score_main_test)
# display result
print('Model accuracy:',logreg_score_main_test)
The output error
ValueError Traceback (most recent call last)
<ipython-input-51-8265b6fa0a29> in <module>
4
5 # evaluate
----> 6 logreg_score_main_test = accuracy_score(Y_test, logreg_main_test)
7 f1_val_main_test = f1_score(Y_test, logreg_score_main_test)
8 recall_val_main_test = recall_score(Y_test, logreg_score_main_test)
2 frames
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
330 uniques = np.unique(lengths)
331 if len(uniques) > 1:
--> 332 raise ValueError(
333 "Found input variables with inconsistent numbers of samples: %r"
334 % [int(l) for l in lengths]
ValueError: Found input variables with inconsistent numbers of samples: [4705, 10086]
The length(number of samples) in your truth labels doesn't match with the number of samples in your predicted labels.
Check the length of your Y_test and logreg_main_test, it should match if not then either your split is incorrect or you are trying to predict with the train split instead of test split.