Find the depth of each node in a binary tree in R?

This question has been asked a few times but they are all in different languages (for example, see here for java and here for python, but Im am trying to achieve this in R.

I have a dataframe of trees. My trees are ordered by depth-first-left-side traversal, as in the code below:

df <- data.frame(
  var = c("x2", NA, NA, "x1", NA, "x2", "x2", NA, NA, NA, "x2", NA, "x10", NA, NA, NA, "x1", NA, NA, "x5", NA, NA),
  node = c(1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 1, 1, 2, 3, 1, 2, 3),
  terminal = c(FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE),
  iteration = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
  treeNum = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 2, 2, 2, 3, 3, 3),
  stringsAsFactors = FALSE 
)

The structure of the dataframe is as follows:

var = variable name (if terminal node or stump, then this is just NA)

node = node number

terminal = if the node is a terminal node or not.

iteration = Iteration number

treeNum = the tree number

As you can see, my dataframe contains 3 trees over 2 iterations.

For clarity, if we look at a single tree (eg, tree number 2 in iteration 1), it would look like the following image (where the nodes are numbered following the traversal method):

And the depth for this tree (ordered in the depth-first-left-side method) would be: 0,1,1,2,3,3,2

Ultimately, I want to make a depth column to add to my dataframe, but Im struggling to make the code work. Any suggestions as to how I could do this?

Solution

I guess if you explicitly do have which nodes are terminal and which are not, then it is possible to decode. So here's a function that can calculate depth with that extra information

node_depth <- function(tree) {
  stopifnot(is.data.frame(tree))
  stopifnot("terminal" %in% names(tree))
  depths <- rep(0, nrow(tree))
  recurse <- function(idx, current_depth) {
    if (idx > nrow(tree)) {
      return(idx)
    }
    depths[idx] <<- current_depth
    if (tree$terminal[idx]) {
      return(idx+1)
    }
    step <- recurse(idx+1, current_depth + 1)
    step <- recurse(step, current_depth + 1)
    step
  }
  recurse(1, 1)
  depths
}

It looks like the tree you drew is treeNum==2 at iteration==1. We can run the function on that tree with

node_depth(subset(df, iteration==1 & treeNum==2))
# [1] 1 2 2 3 4 4 3

(you can subtract 1 from the vector if you prefer to start at 0).

We can run this on all the trees with

lapply(split(df, ~treeNum + iteration), 
  function(x) cbind(x, depth=node_depth(x)))

which returns

$`1.1`
   var node terminal iteration treeNum depth
1   x2    1    FALSE         1       1     1
2 <NA>    2     TRUE         1       1     2
3 <NA>    3     TRUE         1       1     2

$`2.1`
    var node terminal iteration treeNum depth
4    x1    1    FALSE         1       2     1
5  <NA>    2     TRUE         1       2     2
6    x2    3    FALSE         1       2     2
7    x2    4    FALSE         1       2     3
8  <NA>    5     TRUE         1       2     4
9  <NA>    6     TRUE         1       2     4
10 <NA>    7     TRUE         1       2     3

$`3.1`
    var node terminal iteration treeNum depth
11   x2    1    FALSE         1       3     1
12 <NA>    2     TRUE         1       3     2
13  x10    3    FALSE         1       3     2
14 <NA>    4     TRUE         1       3     3
15 <NA>    5     TRUE         1       3     3

$`1.2`
    var node terminal iteration treeNum depth
16 <NA>    1     TRUE         2       1     1

$`2.2`
    var node terminal iteration treeNum depth
17   x1    1    FALSE         2       2     1
18 <NA>    2     TRUE         2       2     2
19 <NA>    3     TRUE         2       2     2

$`3.2`
    var node terminal iteration treeNum depth
20   x5    1    FALSE         2       3     1
21 <NA>    2     TRUE         2       3     2
22 <NA>    3     TRUE         2       3     2

So while it's not generally that a depth first pre-order traversal can allow you to reconstruct the tree, it can if you explicitly know which nodes are terminal and which are not.