Search code examples

data.frame with nested factors into igraph bubble graph

I have a standard data.Frame with some categorical columns and one numeric column. It represents a nested experimental design (but it doesn't really matter), like this:

data = data.frame(toplevel=c("A","A","A","A", "B", "B", "B"), 
                  second = c("A1", "A1", "A2", "A2", "B1", "B1", "B2"),
                  experiments = paste0("exp_00", 1:7),
                  values = runif(7, 1,100))
####   toplevel second experiments   values
#### 1        A     A1     exp_001 12.25664
#### 2        A     A1     exp_002 62.60764
#### 3        A     A2     exp_003 61.31820
#### 4        A     A2     exp_004 62.71456
#### 5        B     B1     exp_005 86.23062
#### ...

i would like to do the same plot with the code of this (the left plot!):

I don't know how to turn my dataframe into an "igraph" data.Frame and proceed with the code suggested to plot (i don't have a from and to column..). My desired output would look like the plot on the right, given my example data (circle size represents the values columns). I tried unsuccessfully using graph_from_data_frame

enter image description here

Thanks edit: my attempted code so far (I only have part of the graph..?)

library(tidyverse); library(igraph); library(ggraph)
edges =[,1:2] %>% setNames(c("from", "to")), data[, 2:3] %>% setNames(c("from", "to")))
vertices = bind_rows(
  data %>% group_by(toplevel) %>% summarize(values=sum(values)) %>% select(name=toplevel, values),
  data %>% group_by(second) %>% summarize(values=sum(values)) %>% select(name=second, values),
  data %>% select(name=experiments, values)
mygraph=graph_from_data_frame(edges, directed = TRUE, vertices = vertices)
ggraph(mygraph, layout = 'circlepack',weight="values") + 
  geom_node_circle() +


  • I think your current attempt is definitely on the right track. One trick that may help is adding an extra "root" node that both of your top-level nodes connect to. At the moment because there is no connection at all between your top level nodes, ggraph only plots one set of them:

    edges =
        data[,1:2] %>% setNames(c("from", "to")), 
        data[, 2:3] %>% setNames(c("from", "to")),
        data.frame(from = c("root", "root"), to = c("A", "B")))
    vertices = bind_rows(
        data.frame(name = "root", values = sum(data$values)),
        data %>% group_by(toplevel) %>% summarize(values=sum(values)) %>% select(name=toplevel, values),
        data %>% group_by(second) %>% summarize(values=sum(values)) %>% select(name=second, values),
        data %>% select(name=experiments, values),
    mygraph=graph_from_data_frame(edges, directed = TRUE, vertices = vertices)
    ggraph(mygraph, layout = 'circlepack',weight="values") + 
        geom_node_circle(aes(fill = depth)) +
        geom_node_label(aes(label = name)) +