Search code examples
rpartyctree

R partykit::ctree() how to break tie in selecting splitting variable of identical p-value


For a node x in partykit::ctree object, I use the following lines to get the splitting variables on the node:

k=info_node(x)
names(k$p.value)

However, a splitting variables of a node returned by this code is different from the one on the tree created by plot. It turns out that three columns in k$criterion have the minimum p-value; i.e.

inds=which(k$criterion['p.value',]==k$p.value)
length(inds) #3

Seems the info_node(x) returns the 1st of the three variables as names(k$p.value), but plot chooses the 3rd one. I wonder if such discrepancy is caused by two reasons:

  1. Multiple variables have the minimum p-value, and there is an internal method to break such a tie in selecting only one splitting variable.

  2. Maybe these three variable have slightly different p-value, but because of the fixed p-value precision in k$criterion, they appear to have the same p-value.

Any insight is appreciated!


Solution

  • The comparisons are done internally on the log-p-value scale, i.e., are more reliable in case of tiny p-values. If ties (within machine precision) still remain for the p-value, they are broken based on the size of the corresponding test statistic.