Search code examples
rdecision-treerwekaj48

How to get classification values in RWeka?


can anybody explain how I get the results of each leave in a decision tree made by J48 from the RWeka package?

So for example we have this iris dataset in R:

 library(RWeka)
 m1 <- J48(Species ~ ., data = iris)
 m1

In prediction I want to use the proportion in a leave. I tried to use the package Partykit but still it looks to complicated just to get the proportion in each leave.

 library(partykit)
 pres <- as.party(m1)
 partykit:::.list.rules.party(pres)

At least I get the number of leaves in the list, but can't find the probability.

pres

Model formula:
Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

Fitted party:
[1] root
|   [2] Petal.Width <= 0.6: setosa (n = 50, err = 0.0%)
|   [3] Petal.Width > 0.6
|   |   [4] Petal.Width <= 1.7
|   |   |   [5] Petal.Length <= 4.9: versicolor (n = 48, err = 2.1%)
|   |   |   [6] Petal.Length > 4.9
|   |   |   |   [7] Petal.Width <= 1.5: virginica (n = 3, err = 0.0%)
|   |   |   |   [8] Petal.Width > 1.5: versicolor (n = 3, err = 33.3%)
|   |   [9] Petal.Width > 1.7: virginica (n = 46, err = 2.2%)

Number of inner nodes:    4
Number of terminal nodes: 5

So as prediction I want for example the result for a new datapoint where Petal.Width > 0.6; Petal.Width <= 1.7; Petal.Length <= 4.9 the result versicolor 97,9%. and 2,1% other. How can I get these predictions?


Solution

  • Your point is not a point. If you fully specify a point, you can simply plug it into the predict function. For example, I will generate a point that meets the specifications, but is unlike other iris points - then classify it.

    ## Generate wild new point
    NewPoint = iris[1,]
    NewPoint[1,3:4] = c(2.0,1.7)
    NewPoint
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          5.1         3.5            2         1.7  setosa
    
    ## Look at where the new point is
    plot(iris[,3:4], pch=20, col=rainbow(3, alpha=0.3)[iris$Species])
    points(NewPoint[,3:4], pch=16, col="orange")
    

    Position of new point

    ## Get the probability from the model
    predict(m1, newdata = NewPoint, type = "probability")
      setosa versicolor  virginica
    1      0  0.9791667 0.02083333