This question has been asked a few times but they are all in different languages (for example, see here for java and here for python, but Im am trying to achieve this in R.
I have a dataframe of trees. My trees are ordered by depth-first-left-side traversal, as in the code below:
df <- data.frame(
var = c("x2", NA, NA, "x1", NA, "x2", "x2", NA, NA, NA, "x2", NA, "x10", NA, NA, NA, "x1", NA, NA, "x5", NA, NA),
node = c(1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 1, 1, 2, 3, 1, 2, 3),
terminal = c(FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE),
iteration = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
treeNum = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 2, 2, 2, 3, 3, 3),
stringsAsFactors = FALSE
)
The structure of the dataframe is as follows:
var = variable name (if terminal node or stump, then this is just NA)
node = node number
terminal = if the node is a terminal node or not.
iteration = Iteration number
treeNum = the tree number
As you can see, my dataframe contains 3 trees over 2 iterations.
For clarity, if we look at a single tree (eg, tree number 2 in iteration 1), it would look like the following image (where the nodes are numbered following the traversal method):
And the depth for this tree (ordered in the depth-first-left-side method) would be: 0,1,1,2,3,3,2
Ultimately, I want to make a depth
column to add to my dataframe, but Im struggling to make the code work. Any suggestions as to how I could do this?
I guess if you explicitly do have which nodes are terminal and which are not, then it is possible to decode. So here's a function that can calculate depth with that extra information
node_depth <- function(tree) {
stopifnot(is.data.frame(tree))
stopifnot("terminal" %in% names(tree))
depths <- rep(0, nrow(tree))
recurse <- function(idx, current_depth) {
if (idx > nrow(tree)) {
return(idx)
}
depths[idx] <<- current_depth
if (tree$terminal[idx]) {
return(idx+1)
}
step <- recurse(idx+1, current_depth + 1)
step <- recurse(step, current_depth + 1)
step
}
recurse(1, 1)
depths
}
It looks like the tree you drew is treeNum==2 at iteration==1. We can run the function on that tree with
node_depth(subset(df, iteration==1 & treeNum==2))
# [1] 1 2 2 3 4 4 3
(you can subtract 1 from the vector if you prefer to start at 0).
We can run this on all the trees with
lapply(split(df, ~treeNum + iteration),
function(x) cbind(x, depth=node_depth(x)))
which returns
$`1.1`
var node terminal iteration treeNum depth
1 x2 1 FALSE 1 1 1
2 <NA> 2 TRUE 1 1 2
3 <NA> 3 TRUE 1 1 2
$`2.1`
var node terminal iteration treeNum depth
4 x1 1 FALSE 1 2 1
5 <NA> 2 TRUE 1 2 2
6 x2 3 FALSE 1 2 2
7 x2 4 FALSE 1 2 3
8 <NA> 5 TRUE 1 2 4
9 <NA> 6 TRUE 1 2 4
10 <NA> 7 TRUE 1 2 3
$`3.1`
var node terminal iteration treeNum depth
11 x2 1 FALSE 1 3 1
12 <NA> 2 TRUE 1 3 2
13 x10 3 FALSE 1 3 2
14 <NA> 4 TRUE 1 3 3
15 <NA> 5 TRUE 1 3 3
$`1.2`
var node terminal iteration treeNum depth
16 <NA> 1 TRUE 2 1 1
$`2.2`
var node terminal iteration treeNum depth
17 x1 1 FALSE 2 2 1
18 <NA> 2 TRUE 2 2 2
19 <NA> 3 TRUE 2 2 2
$`3.2`
var node terminal iteration treeNum depth
20 x5 1 FALSE 2 3 1
21 <NA> 2 TRUE 2 3 2
22 <NA> 3 TRUE 2 3 2
So while it's not generally that a depth first pre-order traversal can allow you to reconstruct the tree, it can if you explicitly know which nodes are terminal and which are not.