Search code examples

Finding the probability with which an instance in classified in Weka

I am using Weka for classification using LibSVM classifier, and wanted some help related to the outputs that I get from the evaluation model.

In the below example, my test.arff file contains 1000 instances, and I want to know the probability with which each instance is classified as yes/ no (It's a simple two class problem).

For instance, for instance 1, if it is classified as 'yes', then with what probability is it classified so, is something which I am looking for.

Below is the code snippet that I have currently:

            // Read and load the Training ARFF file 
        ArffLoader trainArffLoader = new ArffLoader();
        trainArffLoader.setFile(new File("train_clusters.arff"));
        Instances train = trainArffLoader.getDataSet();
        train.setClassIndex(train.numAttributes() - 1);
        System.out.println("Loaded Train File");

        // Read and load the Test ARFF file 
        ArffLoader testArffLoader = new ArffLoader();
        testArffLoader.setFile(new File("test_clusters.arff"));
        Instances test = testArffLoader.getDataSet();
        test.setClassIndex(test.numAttributes() - 1);
        System.out.println("Loaded Test File");

        LibSVM libsvm = new LibSVM();


        // Evaluation
        Evaluation evaluation = new Evaluation(train);
        evaluation.evaluateModel(libsvm, test);
        System.out.println(evaluation.toSummaryString("\nPrinting the Results\n=====================\n", true));


  • You should use libsvm.distributionForInstance method. It returns probability estimate for each class index (for 2 in your cases).

    For example, to print all estimates for each instance from test set use something like this:

        for (Instance instance : test) {
            double[] distribution = libsvm.distributionForInstance(instance);
            for (int classIndex : classIndices) {
                System.out.print(distribution[classIndex] + " ");

    Note that it is not true probability, but estimations made by Platt's method (see the question).