Search code examples
wekanaivebayesconfusion-matrix

How can I change threshold for classification in NaiveBayesMultinomial or compute confusion matrix manually in Weka


I am working on a spam filter mining project and I am currently using the NaiveBayesMultinomial classifier for classifying spam from non-spam by counting the frequency of word occurrences.

The problem is that WEKA sets the threshold for classification to 0.5 by default. However, misclassifying a non-spam as spam is more harmful than vice versa.

I want to adjust the threshold of WEKA's NaiveBayesMultinomial algorithm to see how the confusion matrix changes. If that is not directly possible, how do I utilize the output from WEKA to compute a confusion matrix for a different threshold?


Here is a summary of the project's current results when evaluated on the test split:

Summary:

Correctly Classified Instances        2715               98.4766 %
Incorrectly Classified Instances        42                1.5234 %
Kappa statistic                          0.9679
Mean absolute error                      0.0184
Root mean squared error                  0.1136
Relative absolute error                  3.8317 %
Root relative squared error             23.2509 %
Total Number of Instances             2757     `

Detailed Accuracy By Class:

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.998     0.035      0.978     0.998     0.988      0.998    ham
                 0.965     0.002      0.996     0.965     0.98       0.999    spam
Weighted Avg.    0.985     0.022      0.985     0.985     0.985      0.998

Confusion Matrix:

   a    b   <-- classified as
1669    4 |   a = ham
  38 1046 |   b = spam

Solution

  • I searched around google and it seems it is unlikely to do so in WEKA.

    But this is still feasible to do by 'Test option' -> 'More option' -> 'output predictions' Then it will give me the possibility result of each test sample.

    From there I can use another tool for the rest of the work.