Search code examples
rtreehierarchical-data

How to derive a parent-child table from a hierarchical table using R?


I'm attempting to use the collapsibleTree htmlwidget which requires a dataframe containing a row for each node, i.e. containing all parent-child relationships (known as a DataFrameNetwork in data.tree parlance I believe) in order to take advantage of some of the more useful features.

I am however starting with a dataframe containing a row for each leaf (known as a DataFrameTable?) and am struggling to get it in the right format. I believe the easiest thing to do is to use the data.tree package, but I'm open to suggestions, particularly if they're tidyverse orientated and avoid looping.

Example:

data:

lev1 <- c("A","A","A","B","B","C")
lev2 = c("a", letters[1:5])
lev3 = 1:6
df <- data.frame(lev1, lev2, lev3)

target:

result <- data.frame(parent = c(lev1, lev2),
                     child = c(lev2, lev3)) %>% unique()

(Awful code I know). This particular example has three levels in the hierarchy, but I need the solution to be general enough to apply to any number of levels. I've tried adding a pathString to the dataframe as per the documentation and following up with FromDataFrameTable() and then ToDataFrameNetwork() but I get the feeling I'm going about things all wrong.


Solution

  • library(zoo)
    library(data.table)
    library(dplyr)
    
    rbindlist(lapply(as.data.frame(rollapply(names(df), 2, c), stringsAsFactors = F), 
                     function(x) select(df, c(x)))) %>%
      distinct() %>%
      `colnames<-`(c("parent", "child"))
    

    Output is:

        parent child
     1:      A     a
     2:      A     b
     3:      B     c
     4:      B     d
     5:      C     e
     6:      a     1
     7:      a     2
     8:      b     3
     9:      c     4
    10:      d     5
    11:      e     6
    

    Sample data:

    df <- structure(list(lev1 = structure(c(1L, 1L, 1L, 2L, 2L, 3L), .Label = c("A", 
    "B", "C"), class = "factor"), lev2 = structure(c(1L, 1L, 2L, 
    3L, 4L, 5L), .Label = c("a", "b", "c", "d", "e"), class = "factor"), 
        lev3 = 1:6), .Names = c("lev1", "lev2", "lev3"), row.names = c(NA, 
    -6L), class = "data.frame")