Search code examples
pythonmachine-learningscikit-learnnaivebayes

what is the expected result for this naive bayes multinomial model code


what should be the expected result

when i calculated manually i got P(y=1|x=1) > P(y=0|x=1). But the model is predicting output is 0.

from sklearn.naive_bayes import GaussianNB,MultinomialNB
xx = [[1],[1],[1],[2],[2],[3]]
yy = [1,1,1,0,0,0]
# clf = GaussianNB()
clf = MultinomialNB()
clf.fit(xx,yy)
print(clf.predict([[1]]))

i also tried changing alpha parameter from 1 to 1000. the output is still 0 for input = 1.


Solution

  • For multinomial naive Bayes, the model assumes features to be counts from a multinomial distribution. The following code should make this clear:

    import numpy as np
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.preprocessing import MultiLabelBinarizer
    
    xx = [[1],[1],[1],[2],[2],[3]]
    yy = [1,1,1,0,0,0]
    
    mlb = MultiLabelBinarizer()
    xxtransformed =  mlb.fit_transform(xx)
    print(xxtransformed)
    # [[1 0 0]
    # [1 0 0]
    # [1 0 0]
    # [0 1 0]
    # [0 1 0]
    # [0 0 1]]
    
    clf = MultinomialNB()
    clf.fit(xxtransformed,yy)
    print(mlb.transform(np.array([[1]])))
    #[[1 0 0]]
    print(clf.predict(mlb.transform(np.array([[1]]))))
    #[1]
    

    And indeed, we get the expected prediction of 1