Search code examples
classificationweka

How to automatically optimize a classifier in Weka in order to have a given class to contain 100 % sure data?


I have two (or three) classes and each classes can only possess one label.
I want to optimize (automatically if possible) parameters and thresholds of classifiers in order for my first class to contain only 100 % sure data. Even if it contains a small number of instances.

I don't mind for the remaining classes to contain false alarm or correct rejection.
I don't mind to have unclassified data.

I have already been searching on stackoverflow and on the weka's wiki but maybe my lack of knowledge concerning weka made me miss some keywords.
I also tried to perform the task with the well-known "iris" database but I think that in this case, any class can be 100 % sure.

Yet, I have only succeed in testing multiple classifiers and tuning them manually but without performing 100 % correct for my first class. (I checked this result in the confusion matrix given by weka's report.) Somehow, I know it is possible for my class to contain 100% sure data because I managed to do it in Matlab with simple threshold set manually. But I would like to try out a bigger database, to obtain better threshold and to use the power of weka.

Any suggestions would be helpful, thanks !


Solution

  • You probably need the "Cost Sensitive Classifier" among "meta" classifiers. If you are working in the Explorer, here is the dialog you get.

    Choose the your "classifier" (something beyond ZeroR :) ). Set your "cost matrix". For 2-class problem this will be 2x2 matrix. By setting one non-diagonal component very large (>>1, let us say 1000) you ensure that misclassifying one class (your "first" class) is 1000 times more expensive than misclassifying another class. This should do the job.

    enter image description here