Search code examples
wekaj48

Interpreting results using J48 for a divided attribute of interest in x levels (WEKA)


I'm new to data mining and Weka. I built a classifier with J48 in Weka using the GUI, with J48 (training set) for an attribute of interest in five levels. I have to evaluate the precision of the model, but I don't know very well how to do it! Some information may be of interest:

== Detailed Accuracy By Class ===
Precision
0.80
?
0.67
0.56
?
?

First, I would like to know the meaning of the "?" in the precision column. When probing with an attribute of interest in two levels I got no "?". The tree is bigger now than when dividing into two levels. I am questioning if this means that taking an attribute of interest in five levels could generate a less efficient tree in terms of classification and computation time. This seems quite obvious as the number of Correctly Classified Instances when the attribute had 2 levels were up to 72%.

Thank you in advance, all interesting answers will be rewarded!


Solution

  • "I would like to know the meaning of the "?" in the precision column"

    Note that for these same classes the TP and FP rates are 0. It appears that J48 has not assigned any of your observations to these classes.

    Are these classes relatively small? If so, you might want to consider using the ClassBalancer filter. This will use weights to make all classes look the same size.

    Of course, after you get the model you need to "convert back" to the real situation. This is similar for correcting for physically oversampling or undersampling. See my answer here: https://stats.stackexchange.com/questions/211174/how-to-exact-prediction-from-over-sampled-dataundoing-oversampling/257507#257507