If I do DF$where <- tree$where
after fitting an rpart object using DF
as my data, will each row be mapped to its corresponding leaf through the column where
As an example of how to demonstrate that this is possibly true (modulo my understanding of your question being correct), we work with the first example in ?rpart
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
kyphosis$where <- fit$where
> str(kyphosis)
'data.frame': 81 obs. of 5 variables:
$ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
$ Age : int 71 158 128 2 1 1 61 37 113 59 ...
$ Number : int 3 3 4 5 4 2 2 3 2 6 ...
$ Start : int 5 14 5 1 15 16 17 16 16 12 ...
$ where : int 9 7 9 9 3 3 3 3 3 8 ...
> plot(fit)
> text(fit, use.n = TRUE)
And now look at some tables based on the 'where' vector and some logical tests:
First node:
> with(kyphosis, table(where, Start >= 8.5))
3 0 29
5 0 12
7 0 14
8 0 7
9 19 0 # so this is the row that describes that split
> fit$frame[9,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
3 <leaf> 19 19 8 2 0.01 0 0 2.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
3 8.0000000 11.0000000 0.4210526 0.5789474 0.2345679
Second node:
> with(kyphosis, table(where, Start >= 8.5, Start>=14.5))
, , = FALSE
3 0 0
5 0 12
7 0 14
8 0 7
9 19 0
, , = TRUE
3 0 29
5 0 0
7 0 0
8 0 0
9 0 0
And this is the row of fit$frame that describes the second split:
> fit$frame[3,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
4 <leaf> 29 29 0 1 0.01 0 0 1.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
4 29.0000000 0.0000000 1.0000000 0.0000000 0.3580247
So I would characterize the value of fit$where
as describing the "terminal nodes" which are being labeled as '<leaf>'
, which may or not be what you were calling the "nodes".
> fit$frame
var n wt dev yval complexity ncompete nsurrogate yval2.V1
1 Start 81 81 17 1 0.17647059 2 1 1.00000000
2 Start 62 62 6 1 0.01960784 2 2 1.00000000
4 <leaf> 29 29 0 1 0.01000000 0 0 1.00000000
5 Age 33 33 6 1 0.01960784 2 2 1.00000000
10 <leaf> 12 12 0 1 0.01000000 0 0 1.00000000
11 Age 21 21 6 1 0.01960784 2 0 1.00000000
22 <leaf> 14 14 2 1 0.01000000 0 0 1.00000000
23 <leaf> 7 7 3 2 0.01000000 0 0 2.00000000
3 <leaf> 19 19 8 2 0.01000000 0 0 2.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
1 64.00000000 17.00000000 0.79012346 0.20987654 1.00000000
2 56.00000000 6.00000000 0.90322581 0.09677419 0.76543210
4 29.00000000 0.00000000 1.00000000 0.00000000 0.35802469
5 27.00000000 6.00000000 0.81818182 0.18181818 0.40740741
10 12.00000000 0.00000000 1.00000000 0.00000000 0.14814815
11 15.00000000 6.00000000 0.71428571 0.28571429 0.25925926
22 12.00000000 2.00000000 0.85714286 0.14285714 0.17283951
23 3.00000000 4.00000000 0.42857143 0.57142857 0.08641975
3 8.00000000 11.00000000 0.42105263 0.57894737 0.23456790