Search code examples
classificationwekaadaboostroc

Adaboost weka True positive vs False positive recognition issue


I am using Adaboost M1 algorithm in Weka Experiment Environment with default setup:

  1. Runs (1-10) -> 10 runs to provide more statistically significant results
  2. Random Split Result Producer
  3. I use train percent to divide training from evaluation data

Now, the problem is with the Weighted average TP and FP results. I get this:

TP:0.8
FP:0.47

But as far as I am aware, if TP rate is 0.8, the FP rate should be as high as 0.2? I assume that this has to do something with 10 runs, but anyway if average values is taken from this run, again this FP rate should be much lower?

Sorry if this is too simple question, but from my logic this seems like error in Weka toolkit, or am I wrong? Thanks

EDIT:

In order to avoid asking a new question and because this is related to the same problem, can anyone answer what are Weighted average values displayed in Weka?

I have included the Atilla's example below: it can be seen that Weighted average are not Average values,e.g. AVG(0.933,0.422) != 0.77, etc.

Can someone answer what these values actually are?

=== Detailed Accuracy By Class ===

         TP Rate  FP Rate  Precision  Recall  F-Measure  MCC    ROC Area  PRC Area  Class
         0.933    0.578    0.776      0.933   0.847      0.429  0.844     0.917     tested_negative
         0.422    0.067    0.745      0.422   0.538      0.429  0.844     0.696     tested_positive

Weighted Avg. 0.77 0.416 0.766 0.77 0.749 0.429 0.844 0.847


Solution

  • I run adoboostM1 with default parameters on diabetes data set of weka. I got following results.

    === Detailed Accuracy By Class ===
    
                 TP Rate  FP Rate  Precision  Recall  F-Measure  MCC    ROC Area  PRC Area  Class
                 0.933    0.578    0.776      0.933   0.847      0.429  0.844     0.917     tested_negative
                 0.422    0.067    0.745      0.422   0.538      0.429  0.844     0.696     tested_positive
    Weighted Avg.    0.77     0.416    0.766      0.77    0.749      0.429  0.844     0.847
    

    Notice that this TP Rate and FP rate is for each of your class values. Since I have two (2) values for class feature in this data set, I have two (2) lines.

    Also notice that:

    0.933  + 0.067 = 1 
    0.578 + 0.422 = 1 
    

    As you correctly pointed that TP rate + FP rate should be equal to one (1). So in your example: I assume that you have following class variable:

    target {A,B}
    
    TP Rate FP Rate 
    0.8      0.47   ..... for A
    0.53     0.2    ..... for B