Search code examples
r

Merging a Data Frame to each element in a List?


I have the following graph:

library(igraph)
n_rows <- 10
n_cols <- 5
g <- make_lattice(dimvector = c(n_cols, n_rows))

layout <- layout_on_grid(g, width = n_cols)

n_nodes <- vcount(g)
node_colors <- rep("white", n_nodes)

for (row in 0:(n_rows-1)) {
    start_index <- row * n_cols + 1
    node_colors[start_index:(start_index+2)] <- "orange"  
    node_colors[(start_index+3):(start_index+4)] <- "purple"    
}

node_labels <- 1:n_nodes

plot(g, 
     layout = layout, 
     vertex.color = node_colors,
     vertex.label = node_labels,
     vertex.label.color = "black",
     vertex.size = 15,
     edge.color = "gray",
     main = "Rectangular Undirected Network")

enter image description here

In this previous question (Randomly Split a Graph into Mini Graphs), I learned how to break this graph into 5 mini connected subgraphs:

library(data.table)

f <- function(g, n) {
  m <- length(g)
  dt <- setDT(as_data_frame(g))
  dt <- rbindlist(list(dt, dt[,.(from = to, to = from)]))
  dt[,group := 0L]
  used <- logical(m)
  s <- sample(m, n)
  used[s] <- TRUE
  m <- m - n
  dt[from %in% s, group := .GRP, from]
  
  while (m) {
    dt2 <- unique(
      dt[group != 0L & !used[to], .(grow = to, onto = group)][sample(.N)],
      by = "grow"
    )
    dt[dt2, on = .(from = grow), group := onto]
    used[dt2[[1]]] <- TRUE
    m <- m - nrow(dt2)
  }
  
  unique(dt[,to := NULL])[,.(vertices = .(from), .N), group]
}

Question: Suppose I run this function 25 times and store

generate_multiple_subgraphs <- function(n_iterations = 25, n_rows = 10, n_cols = 5, n_subgraphs = 5) {
    g <- make_lattice(dimvector = c(n_cols, n_rows))
    
    subgraph_list <- lapply(1:n_iterations, function(i) {
        f(g, n_subgraphs)
    })
    
    return(subgraph_list)
}
subgraph_sets <- generate_multiple_subgraphs()

In each of these subgraphs, I want to count the percentage of purple nodes (with regards to original colors, i.e. the graph that was purple-orange at the start) in each partition.

I was able to get a summary of the original graph:

original_node_data <- data.frame(
    Node = 1:n_nodes,
    Color = node_colors
)

But I am not sure how to merge this data frame to the list of subgraphs to get a result like this:

   subgraph partition total_nodes purple_nodes percent_purple
        <int>     <int>       <int>        <int>          <num>
  1:        1         1          14            8       57.14286
  2:        1         2          12            2       16.66667
  3:        1         3           4            0        0.00000
  4:        1         4           9            6       66.66667
  5:        1         5          11            4       36.36364
 ---                                                           
121:       25         1          13            3       23.00000
122:       25         2           6            6      100.00000
123:       25         3           9            0        0.00000
124:       25         4           8            5       62.50000
125:       25         5          14            6       42.00000

Can someone please show me how to do this?


Solution

  • In each of these subgraphs, I want to count the percentage of purple nodes (with regards to original colors, i.e. the graph that was purple-orange at the start) in each partition.

    You might want something along the lines

    subgraph_sets = lapply(subgraph_sets, transform, purple = 
                             vapply(vertices, 
                                    \(j) round(prop.table(table(node_colors[j]))["purple"] * 100L, 2L), 
                                    numeric(1L)))
    

    giving

    > head(subgraph_sets)
    [[1]]
       group              vertices     N purple
       <int>                <list> <int>  <num>
    1:     1       1,2,3,4,5,6,...     9  33.33
    2:     2 10,14,15,19,20,25,...     8 100.00
    3:     3 11,12,13,16,17,18,...    16  12.50
    4:     4 31,32,36,37,41,42,...     9     NA
    5:     5 34,39,40,44,45,48,...     8  87.50
    
    [[2]]
       group              vertices     N purple
       <int>                <list> <int>  <num>
    1:     1       1,2,3,4,5,6,...    19  52.63
    2:     2 11,16,17,21,22,26,...    10     NA
    3:     3 23,28,29,33,34,37,...    10  40.00
    4:     4        30,35,40,45,50     5 100.00
    5:     5     41,42,46,47,48,49     6  16.67
    
    [[3]]
       group              vertices     N purple
       <int>                <list> <int>  <num>
    1:     2         1, 6, 7,11,12     5     NA
    2:     1       2,3,4,5,8,9,...    11  54.55
    3:     3     16,17,21,22,26,27     6     NA
    4:     4 19,20,23,24,25,28,...    16  75.00
    5:     5 31,32,36,37,41,42,...    12  16.67
    
    [[4]]
       group              vertices     N purple
       <int>                <list> <int>  <num>
    1:     1       1,2,3,4,6,7,...    15  13.33
    2:     2  5,10,14,15,19,20,...    10  80.00
    3:     3 26,27,28,29,30,32,...     9  44.44
    4:     4 31,36,37,38,39,40,...     7  28.57
    5:     5 42,43,44,45,46,47,...     9  44.44
    
    [[5]]
       group              vertices     N purple
       <int>                <list> <int>  <num>
    1:     1       1,2,3,4,5,6,...    10  40.00
    2:     2 11,12,13,16,17,21,...     7     NA
    3:     3 14,15,18,19,20,24,...     7  85.71
    4:     4 23,26,27,28,29,31,...    16  12.50
    5:     5 30,35,39,40,44,45,...    10  80.00
    
    [[6]]
       group              vertices     N purple
       <int>                <list> <int>  <num>
    1:     1      1, 6,11,16,21,26     6     NA
    2:     2  2, 3, 7, 8,12,13,...     9     NA
    3:     3  4, 5, 9,10,14,15,...     8 100.00
    4:     4 23,24,25,27,28,29,...    14  71.43
    5:     5 31,32,36,37,38,41,...    13  15.38
    

    EDIT: Inside transform() you can create more variables; like this example shows for purple.