Search code examples
rrpartparty

In the as.party function how can I clarify which are the indices for the different nodes?


After creating my CART with rpart I proceed to convert it to a party object with the as.party function from the partykit package. The subsecuent error appears:

as.party(tree.hunterpb1)

Error in partysplit(varid = which(rownames(obj$split)[j] == names(mf)),  : 
‘index’ has less than two elements

I can only assume thet it's refering to the partitioning made by factor variables as I´ve understood from the literature, since the index applies to factors. My tree looks like this:

tree.hunterpb1 n= 354

node), split, n, deviance, yval
  * denotes terminal node

 1) root 354 244402.100 75.45134  
2) hr.11a14>=49.2125 19   3378.322 33.44274 *
3) hr.11a14< 49.2125 335 205592.400 77.83391  
 6) month=April,February,June,March,May 141  58656.390 68.57493 *
 7) month=August,December,January,July,November,October,September 194 126062.800 84.56338  
  14) presion.11a14>=800.925 91  74199.080 81.32755  
    28) month=January,November,October 16   9747.934 63.13394 *
    29) month=August,December,July,September 75  58025.190 85.20885 *
  15) presion.11a14< 800.925 103  50069.100 87.42223 *

The traceback shows that the first partition´s conversion to party class is done correctly but the second one based on the factor variables fails and produced said error.

Previously when working on similar data this error has not appeared. I can only assume that the as.party function isn't finding the indeces. Any advice on how to solve this will be appreciated.


Solution

  • Possibly, the problem is caused by the following situation. (Thanks to Yan Tabachek for e-mailing me a similar example.) If one of the partitioning variables passed on to rpart() is a character variable, then it is processed as if it were a factor by rpart() but not by the conversion in as.party(). As a simple example consider this small data set:

    d <- data.frame(y = c(1:10, 101:110))
    d$x <- rep(c("a", "b"), each = 10)
    

    Fitting the rpart() tree treats the character variable x as a factor:

    library("rpart")
    (rp <- rpart(y ~ x, data = d))
    
    ## n= 20 
    ## 
    ## node), split, n, deviance, yval
    ##       * denotes terminal node
    ## 
    ## 1) root 20 50165.0  55.5  
    ##   2) x=a 10    82.5   5.5 *
    ##   3) x=b 10    82.5 105.5 *
    

    However, the as.party() conversion does not work:

    library("partykit")
    as.party(rp)
    
    ## Error in partysplit(varid = which(rownames(obj$split)[j] == names(mf)),  : 
    ##   'index' has less than two elements
    

    The best fix is to transform x to a factor variable and re-fit the tree. Then the conversion also works smoothly:

    d$x <- factor(d$x)
    rp <- rpart(y ~ x, data = d)
    as.party(rp)
    
    ## Model formula:
    ## y ~ x
    ## 
    ## Fitted party:
    ## [1] root
    ## |   [2] x in a: 5.500 (n = 10, err = 82.5)
    ## |   [3] x in b: 105.500 (n = 10, err = 82.5)
    ## 
    ## Number of inner nodes:    1
    ## Number of terminal nodes: 2
    

    I also added a fix in the development version of partykit on R-Forge to avoid the problem in the first place. It will be included in the next CRAN release (probably 1.0-1 for which a release date has not yet been scheduled).