Search code examples
rbinary-tree

Find the depth of each node in a binary tree in R?


This question has been asked a few times but they are all in different languages (for example, see here for java and here for python, but Im am trying to achieve this in R.

I have a dataframe of trees. My trees are ordered by depth-first-left-side traversal, as in the code below:

df <- data.frame(
  var = c("x2", NA, NA, "x1", NA, "x2", "x2", NA, NA, NA, "x2", NA, "x10", NA, NA, NA, "x1", NA, NA, "x5", NA, NA),
  node = c(1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 1, 1, 2, 3, 1, 2, 3),
  terminal = c(FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE),
  iteration = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
  treeNum = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 2, 2, 2, 3, 3, 3),
  stringsAsFactors = FALSE 
)

The structure of the dataframe is as follows:

var = variable name (if terminal node or stump, then this is just NA)

node = node number

terminal = if the node is a terminal node or not.

iteration = Iteration number

treeNum = the tree number

As you can see, my dataframe contains 3 trees over 2 iterations.

For clarity, if we look at a single tree (eg, tree number 2 in iteration 1), it would look like the following image (where the nodes are numbered following the traversal method):

binary tree

And the depth for this tree (ordered in the depth-first-left-side method) would be: 0,1,1,2,3,3,2

Ultimately, I want to make a depth column to add to my dataframe, but Im struggling to make the code work. Any suggestions as to how I could do this?


Solution

  • I guess if you explicitly do have which nodes are terminal and which are not, then it is possible to decode. So here's a function that can calculate depth with that extra information

    node_depth <- function(tree) {
      stopifnot(is.data.frame(tree))
      stopifnot("terminal" %in% names(tree))
      depths <- rep(0, nrow(tree))
      recurse <- function(idx, current_depth) {
        if (idx > nrow(tree)) {
          return(idx)
        }
        depths[idx] <<- current_depth
        if (tree$terminal[idx]) {
          return(idx+1)
        }
        step <- recurse(idx+1, current_depth + 1)
        step <- recurse(step, current_depth + 1)
        step
      }
      recurse(1, 1)
      depths
    }
    

    It looks like the tree you drew is treeNum==2 at iteration==1. We can run the function on that tree with

    node_depth(subset(df, iteration==1 & treeNum==2))
    # [1] 1 2 2 3 4 4 3
    

    (you can subtract 1 from the vector if you prefer to start at 0).

    We can run this on all the trees with

    lapply(split(df, ~treeNum + iteration), 
      function(x) cbind(x, depth=node_depth(x)))
    

    which returns

    $`1.1`
       var node terminal iteration treeNum depth
    1   x2    1    FALSE         1       1     1
    2 <NA>    2     TRUE         1       1     2
    3 <NA>    3     TRUE         1       1     2
    
    $`2.1`
        var node terminal iteration treeNum depth
    4    x1    1    FALSE         1       2     1
    5  <NA>    2     TRUE         1       2     2
    6    x2    3    FALSE         1       2     2
    7    x2    4    FALSE         1       2     3
    8  <NA>    5     TRUE         1       2     4
    9  <NA>    6     TRUE         1       2     4
    10 <NA>    7     TRUE         1       2     3
    
    $`3.1`
        var node terminal iteration treeNum depth
    11   x2    1    FALSE         1       3     1
    12 <NA>    2     TRUE         1       3     2
    13  x10    3    FALSE         1       3     2
    14 <NA>    4     TRUE         1       3     3
    15 <NA>    5     TRUE         1       3     3
    
    $`1.2`
        var node terminal iteration treeNum depth
    16 <NA>    1     TRUE         2       1     1
    
    $`2.2`
        var node terminal iteration treeNum depth
    17   x1    1    FALSE         2       2     1
    18 <NA>    2     TRUE         2       2     2
    19 <NA>    3     TRUE         2       2     2
    
    $`3.2`
        var node terminal iteration treeNum depth
    20   x5    1    FALSE         2       3     1
    21 <NA>    2     TRUE         2       3     2
    22 <NA>    3     TRUE         2       3     2
    

    So while it's not generally that a depth first pre-order traversal can allow you to reconstruct the tree, it can if you explicitly know which nodes are terminal and which are not.