Search code examples
classificationweka

OneR WEKA - wrong prediction?


I am trying to make a ranking of attributes depending on their predictive power by using OneR in WEKA iteratively. At every run I remove the chosen attribute to see what the next best is.

I have done this for all my attributes and some (3 out of ten attributes) get 'ranked' higher than others, although they have less % correct prediction, a smaller ROC Area average and their rules are less compact.

As I understand, OneR just looks at the frequency tables for the attribute it has and then the class values, so it wouldn't care about whether I take attributes out or not...but I am probably missing something

Would anyone have an idea?


Solution

  • As an alternative you can you use the OneR package (available on CRAN, more information here: OneR - Establishing a New Baseline for Machine Learning Classification Models)

    With the option verbose = TRUE you get the accuracy of all attributes, e.g.:

    > library(OneR)
    > example(OneR)
    
    OneR> data <- optbin(iris)
    
    OneR> model <- OneR(data, verbose = TRUE)
    
        Attribute    Accuracy
    1 * Petal.Width  96%     
    2   Petal.Length 95.33%  
    3   Sepal.Length 74.67%  
    4   Sepal.Width  55.33%  
    ---
    Chosen attribute due to accuracy
    and ties method (if applicable): '*'
    
    
    OneR> summary(model)
    
    Rules:
    If Petal.Width = (0.0976,0.791] then Species = setosa
    If Petal.Width = (0.791,1.63]   then Species = versicolor
    If Petal.Width = (1.63,2.5]     then Species = virginica
    
    Accuracy:
    144 of 150 instances classified correctly (96%)
    
    Contingency table:
                Petal.Width
    Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
      setosa               * 50            0          0  50
      versicolor              0         * 48          2  50
      virginica               0            4       * 46  50
      Sum                    50           52         48 150
    ---
    Maximum in each column: '*'
    
    Pearson's Chi-squared test:
    X-squared = 266.35, df = 4, p-value < 2.2e-16
    

    (full disclosure: I am the author of this package and I would be very interested in the results you get)