I am working on a spam filter mining project and I am currently using the NaiveBayesMultinomial classifier for classifying spam from non-spam by counting the frequency of word occurrences.
The problem is that WEKA sets the threshold for classification to 0.5 by default. However, misclassifying a non-spam as spam is more harmful than vice versa.
I want to adjust the threshold of WEKA's NaiveBayesMultinomial algorithm to see how the confusion matrix changes. If that is not directly possible, how do I utilize the output from WEKA to compute a confusion matrix for a different threshold?
Here is a summary of the project's current results when evaluated on the test split:
Summary:
Correctly Classified Instances 2715 98.4766 %
Incorrectly Classified Instances 42 1.5234 %
Kappa statistic 0.9679
Mean absolute error 0.0184
Root mean squared error 0.1136
Relative absolute error 3.8317 %
Root relative squared error 23.2509 %
Total Number of Instances 2757 `
Detailed Accuracy By Class:
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.998 0.035 0.978 0.998 0.988 0.998 ham
0.965 0.002 0.996 0.965 0.98 0.999 spam
Weighted Avg. 0.985 0.022 0.985 0.985 0.985 0.998
Confusion Matrix:
a b <-- classified as
1669 4 | a = ham
38 1046 | b = spam
I searched around google and it seems it is unlikely to do so in WEKA.
But this is still feasible to do by 'Test option' -> 'More option' -> 'output predictions' Then it will give me the possibility result of each test sample.
From there I can use another tool for the rest of the work.