Search code examples
machine-learningnlpclassificationwekadata-mining

How to identify the details of incorrectly classified instances in Weka GUI?


I want to get the details (unique id) of the incorrectly classified instances using Weka GUI. I am following the answers of this question. In that, they ask to use the filter StringToNominal in Preprocessing tab to convert the unique id, which is an string. However, by following that, I doubt if the classifier is considering the unique id column also as a feature during the classification?

Please suggest me the correct way of approaching this.

I happy to provide examples if needed.


Solution

  • Let's suppose you want to (1) add an instance ID, (2) not use that instance ID in the model, and (3) see the individual predictions, with the instance ID and maybe some other attributes.

    We’re going to show this with a smaller data set. Open iris.arff, for example.

    Use the AddID filter in the Preprocess tab, in the Unsupervised Attribute filters. ID will be the first attribute.

    Now we need to ignore it during the modeling. Use the filtered classifier with the Remove filter. Remove filter

    And we need to output the predictions with the ID variable so we can see what happened. Here we are outputting all the attributes, although we don’t need to do all. Ask for instance output with predictions

    We get out this detail in the output window:

    === Predictions on test split ===
    
    inst#,actual,predicted,error,prediction,ID,sepallength,sepalwidth,petallength,petalwidth
    1,2:Iris-versicolor,2:Iris-versicolor,,0.968,53,6.9,3.1,4.9,1.5
    2,3:Iris-virginica,3:Iris-virginica,,0.968,131,7.4,2.8,6.1,1.9
    3,2:Iris-versicolor,2:Iris-versicolor,,0.968,59,6.6,2.9,4.6,1.3
    4,1:Iris-setosa,1:Iris-setosa,,1,36,5,3.2,1.2,0.2
    5,3:Iris-virginica,3:Iris-virginica,,0.968,101,6.3,3.3,6,2.5
    6,2:Iris-versicolor,2:Iris-versicolor,,0.968,88,6.3,2.3,4.4,1.3
    7,1:Iris-setosa,1:Iris-setosa,,1,42,4.5,2.3,1.3,0.3
    8,1:Iris-setosa,1:Iris-setosa,,1,8,5,3.4,1.5,0.2
    

    and so on.