Search code examples
ralgorithmigraphpartitioning

Simulating random numbers between ranges that sum to a constant?


This is a follow-up question to a question I posted earlier.

A while back, I posted this question on how to randomly split a graph into connected subgraphs : Randomly Split a Graph into Mini Graphs:

library(igraph)
n_rows <- 10
n_cols <- 5
g <- make_lattice(dimvector = c(n_cols, n_rows))

layout <- layout_on_grid(g, width = n_cols)

n_nodes <- vcount(g)
node_colors <- rep("white", n_nodes)

for (row in 0:(n_rows-1)) {
    start_index <- row * n_cols + 1
    node_colors[start_index:(start_index+2)] <- "orange"  
    node_colors[(start_index+3):(start_index+4)] <- "purple"    
}

node_labels <- 1:n_nodes

plot(g, 
     layout = layout, 
     vertex.color = node_colors,
     vertex.label = node_labels,
     vertex.label.color = "black",
     vertex.size = 15,
     edge.color = "gray",
     main = "Rectangular Undirected Network")

enter image description here

I used one of the (amazing) answers provided to run this function multiple times (https://stackoverflow.com/a/78982967/26653497):

library(igraph)
library(data.table)

f <- function(g, n) {
    m <- length(g)
    dt <- setDT(as_data_frame(g))
    dt <- rbindlist(list(dt, dt[,.(from = to, to = from)]))
    dt[,group := 0L]
    used <- logical(m)
    s <- sample(m, n)
    used[s] <- TRUE
    m <- m - n
    dt[from %in% s, group := .GRP, from]
    
    while (m) {
        dt2 <- unique(
            dt[group != 0L & !used[to], .(grow = to, onto = group)][sample(.N)],
            by = "grow"
        )
        dt[dt2, on = .(from = grow), group := onto]
        used[dt2[[1]]] <- TRUE
        m <- m - nrow(dt2)
    }
    
    unique(dt[,to := NULL])[,.(vertices = .(from)), group]
}


plot_multiple_subgraphs <- function(n_plots = 25, n_rows = 10, n_cols = 5, n_subgraphs = 5) {
    g <- make_lattice(dimvector = c(n_cols, n_rows))
    layout <- layout_on_grid(g, width = n_cols)
    n_nodes <- vcount(g)
    
    color_palette <- c("red", "blue", "green", "yellow", "purple")
    
    par(mfrow = c(5, 5), mar = c(0.5, 0.5, 2, 0.5))
    
    for (i in 1:n_plots) {
        subgraphs <- f(g, n_subgraphs)
        
        node_colors <- rep("white", n_nodes)
        
        for (j in 1:nrow(subgraphs)) {
            nodes <- unlist(subgraphs$vertices[j])
            node_colors[nodes] <- color_palette[j]
        }
        
        plot(g, 
             layout = layout, 
             vertex.color = node_colors,
             vertex.label = NA,  
             vertex.size = 15,   
             edge.color = "gray",
             edge.width = 0.5,  
             main = paste("Partition", i),  
             cex.main = 0.8)     
    }
}

plot_multiple_subgraphs()

enter image description here

I am now wondering - is there a way to add constraints to this function? For example, I want 7 random connected subgraphs such that each graph has minimum 5% of all nodes and maximum 25% of all nodes?

I wrote this small function which generates 7 random numbers that sum to 100 such that the smallest number is larger than 5 and the largest number is smaller than 25:

generate_one_set <- function(n = 7, total = 100, min_val = 5, max_val = 25) {
  repeat {
    points <- sort(c(0, runif(n-1), 1))
    numbers <- diff(points) * total
    if(min(numbers) >= min_val && max(numbers) <= max_val) {
      return(round(numbers, 2))
    }
  }
}

set.seed(123) 
for(i in 1:5) {
    result <- generate_one_set()
    print(result)
    cat("Sum:", sum(result), "\n\n")
}

[1] 12.75  7.90 16.79 18.65 19.24 14.17 10.50
Sum: 100 

[1]  9.48 17.95 10.96  6.45 21.66 14.95 18.54
Sum: 99.99 

[1] 18.38  8.19 14.71 21.72 11.66 11.71 13.64
Sum: 100.01 

[1] 16.81  9.95 13.69 12.67  6.20 19.22 21.47
Sum: 100.01 

[1]  8.63 11.57  8.10 13.74 16.68 21.94 19.33
Sum: 99.99 

Can I somehow introduce this constraint into the earlier function?


Solution

  • If you are referring to the size of each group (which should consist of 0.05 ~ 0.25 of all nodes), probably you can try rmultinom, e.g.,

    K <- 7
    N <- n_rows * n_cols
    minRho <- 0.05
    maxRho <- 0.25
    minSz <- ceiling(minRho * N)
    repeat {
        p <- rmultinom(1, N - minSz * K, rep(1, K)) + minSz
        if (all(p <= maxRho * N)) break
    }
    

    and you will obtain something like

    > p
         [,1]
    [1,]    6
    [2,]    9
    [3,]    6
    [4,]    6
    [5,]    7
    [6,]    7
    [7,]    9
    
    > sum(p)
    [1] 50
    

    It seems your problem is similar to your previous question, so you can replace the code for partitioning (see the corresponding lines in the answer) with the code above, then it should work with size constraints.