Search code examples
rtreeaggregatenodesdata.tree

Recursive aggregation of data tree


I'm using the data.tree package to perform analysis instead of nested lists as my actual data ends up with the lists being too nested and quite difficult to work with.

I've put together sample data based off this question. Aggregating values on a data tree with R. In my own data I have the to, from, hours data and need to create the actual_hours data which is the sum of the children subtracted from the hours value for its parent node.

library(data.tree)
#Have
to <- c("Team1", "Team1-1","Team1-1", "Team1-1-1", "Team1-1-1", "Team1-1-1", "Team1-1-2")
from <- c("Team1-1", "Team1-1-1","Team1-1-2", "Team1-1-1a", "Team1-1-1b", "Team1-1-1c" ,"Team1-1-2a")
hours <- c(NA,150,200,65,20,30, 30)
df <- data.frame(from,to,hours)

# Create data tree
tree <- FromDataFrameNetwork(df)
###Current Output
print(tree, "hours")
              levelName hours
1 Team1                     NA
2  °--Team1-1               NA
3      ¦--Team1-1-1        150
4      ¦   ¦--Team1-1-1a    65
5      ¦   ¦--Team1-1-1b    20
6      ¦   °--Team1-1-1c    30
7      °--Team1-1-2        200
8          °--Team1-1-2a    30


#Need to create
actual_hours <- c(NA,35,170, 65,20,30, 30)
df <- data.frame(from,to,hours, actual_hours)
# Create data tree
tree <- FromDataFrameNetwork(df)
#Desired output
print(tree, "hours", 'actual_hours')

               levelName hours actual_hours
1 Team1                     NA           NA
2  °--Team1-1               NA           NA
3      ¦--Team1-1-1        150           35
4      ¦   ¦--Team1-1-1a    65           65
5      ¦   ¦--Team1-1-1b    20           20
6      ¦   °--Team1-1-1c    30           30
7      °--Team1-1-2        200          170
8          °--Team1-1-2a    30           30

I'm not sure exactly how to do it since it involves upward and downward movement in the tree? I think using the height and/or the level attributes of the tree is the way to go but not sure.


Solution

  • Assume that the data.tree looks like this

                   levelName hours
    1 Team1                     NA
    2  °--Team1-1                0
    3      ¦--Team1-1-1        150
    4      ¦   ¦--Team1-1-1a    65
    5      ¦   ¦--Team1-1-1b    20
    6      ¦   °--Team1-1-1c    30
    7      °--Team1-1-2        200
    8          °--Team1-1-2a    30
    

    Here are two versions for you to try out as I'm not sure about which one you want.


    Non-cumulative

    tree$Do(function(node) {
      node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "hours", aggFun = sum)
    }, traversal = "post-order")
    
    > print(tree, "hours", "actual_hours")
                   levelName hours actual_hours
    1 Team1                     NA           NA
    2  °--Team1-1                0         -350 # see here, -350=0-(150+200)
    3      ¦--Team1-1-1        150           35
    4      ¦   ¦--Team1-1-1a    65           65
    5      ¦   ¦--Team1-1-1b    20           20
    6      ¦   °--Team1-1-1c    30           30
    7      °--Team1-1-2        200          170
    8          °--Team1-1-2a    30           30
    

    Cumulative

    tree$Do(function(node) {
      node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "actual_hours", aggFun = sum)
    }, traversal = "post-order")
    
    > print(tree, "hours", "actual_hours")
                   levelName hours actual_hours
    1 Team1                     NA           NA
    2  °--Team1-1                0         -205 # -205=0-(35+170)
    3      ¦--Team1-1-1        150           35
    4      ¦   ¦--Team1-1-1a    65           65
    5      ¦   ¦--Team1-1-1b    20           20
    6      ¦   °--Team1-1-1c    30           30
    7      °--Team1-1-2        200          170
    8          °--Team1-1-2a    30           30