Search code examples
javamachine-learningclassificationwekadecision-tree

Incorrect class prediction using Weka


I am using the WEKA API weka-stable-3.8.1.
I have been trying to use J48 decision tree(C4.5 implementation of weka). My data has around 22 features and a nominal class with 2 possible values : yes or no.
While evaluating with the following code :

Classifier model = (Classifier) weka.core.SerializationHelper.read(trainedModelDestination);
Evaluation evaluation = new Evaluation(trainingInstances);
evaluation.evaluateModel(model, testingInstances);
System.out.println("Number of correct predictions : "+evaluation.correct());


I get all predictions correct. But when I try these test cases individually using :

for(Instance i : testingInstances){
    double predictedClassLabel = model.classifyInstance(i);
    System.out.println("predictedClassLabel : "+predictedClassLabel);
}


I always get the same output, i.e. 0.0.

Why is this happening ?


Solution

  • Should have updated much sooner. Here's how I fixed this:

    During the training phase, the model learns from your training set. While learning from this set it encounters categorical/nominal features as well.

    Most algorithms require numerical values to work. To deal with this the algorithm maps the variables to a specific numerical value. longer explanation here

    Since the algorithm has learned this during the training phase, the Instances object holds this information. During testing phase you have to use the same Instances object that was created during training phase. Otherwise, the testing classifier will not correctly map your nominal values to their expected values.

    Note:

    This kind of encoding gives biased training results in Non-tree based models and things like One-Hot-Encoding should be used in such cases.