Search code examples
python-3.xmachine-learningscikit-learnnaivebayes

ValueError: shapes (4155,1445) and (4587,7) not aligned: 1445 (dim 1) != 4587 (dim 0)


I'm trying to predict with a different dataset. But still have a problem with it

I've tried to change the parameters, but still no difference.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=77)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((15484, 4587), (3871, 4587), (15484,), (3871,))

nb = MultinomialNB(alpha=0.01)
mnb = nb.partial_fit(X_train, y_train, classes)

and then I'm calling my 2nd dataset:

X_train3, X_test3, y_train3, y_test3 = train_test_split(X3, y3, test_size = 0.99999, random_state=77)
X_train3.shape, X_test3.shape, y_train3.shape, y_test3.shape

((0, 1445), (4155, 1445), (0,), (4155,))

y_pred=mnb.predict(X_test3)

ValueError: shapes (4155,1445) and (4587,7) not aligned: 1445 (dim 1) != 4587 (dim 0)

I expect the model can predict with my second dataset. Any help is appreciated. tks!


Solution

  • Have a look at the sci-kit learn documentation for Multinomial NB.

    It clearly specifies the structure of the input data while trainig model.fit() must match the structure of the input data while testing or scoring model.predict().

    This means that you cannot use the same model for different dataset. The only way this is possible is that both the dataset have the same features (same number of features and in the same order as the training dataset).

    In your case this is not going to work as the datasets are different which is visible from the shape of the two datasets.

    Set 1 has 4587 features
    Set 2 has 1445 features
    

    Make sure the both the dataset have the same number of features and in the same order as the training set.