Search code examples
ralgorithmigraph

Randomly Splitting a Graph according to Conditions


I found this picture online:

enter image description here

I am trying to use simulate the above partitioning process in R using evolutionary algorithms.

For example, suppose I have a similar graph network:

library(igraph)    
n_rows <- 10
n_cols <- 5
g <- make_lattice(dimvector = c(n_cols, n_rows))
layout <- layout_on_grid(g, width = n_cols)
n_nodes <- vcount(g)
node_colors <- rep("white", n_nodes)
for (row in 0:(n_rows-1)) {
    start_index <- row * n_cols + 1
    node_colors[start_index:(start_index+2)] <- "orange"  
    node_colors[(start_index+3):(start_index+4)] <- "purple"    
}
node_labels <- 1:n_nodes
plot(g, 
     layout = layout, 
     vertex.color = node_colors,
     vertex.label = node_labels,
     vertex.label.color = "black",
     vertex.size = 15,
     edge.color = "gray",
     main = "Rectangular Undirected Network")

enter image description here

I would like to use an evolutionary algorithm to partition this graph network into 5 induced subgraphs such that each subgraph has at least 5 nodes and purple "wins" in the majority of all subgraphs. I would like to identify different such options (e.g. 5th option in the first photo) in which purple wins and each subgraph has at least 5 nodes.

Here is the approach I am currently using:

library(igraph)
library(GA)
library(ggplot2)
library(dplyr)
library(gridExtra)

# original graph
n_rows <- 10
n_cols <- 5
g <- make_lattice(dimvector = c(n_cols, n_rows))
n_nodes <- vcount(g)

node_colors <- rep("white", n_nodes)
for (row in 0:(n_rows-1)) {
    start_index <- row * n_cols + 1
    node_colors[start_index:(start_index+2)] <- "orange"  
    node_colors[(start_index+3):(start_index+4)] <- "purple"    
}

# define fitness function based on constraints
fitness <- function(solution) {
    subgraphs <- split(1:n_nodes, solution)
    
    # check if all subgraphs have at least 5 nodes
    if (any(sapply(subgraphs, length) < 5)) {
        return(-Inf)
    }
    
    # count purple wins
    purple_wins <- sum(sapply(subgraphs, function(sg) {
        sum(node_colors[sg] == "purple") > sum(node_colors[sg] == "orange")
    }))
    
  
    return(purple_wins)
}

# genetic Algorithm
ga_result <- ga(
    type = "permutation",
    fitness = fitness,
    min = 1,
    max = 5,
    popSize = 50,
    maxiter = 1000,
    run = 100,
    pmutation = 0.2,
    monitor = FALSE,
    keepBest = TRUE
)

   

# multiple solutions
n_solutions <- 3  # Number of solutions to display
solutions <- ga_result@solution[1:n_solutions,]

for (i in 1:n_solutions) {
    cat("Solution", i, "\n")
    cat("Fitness score:", ga_result@fitness[i], "\n\n")
    plot_subgraphs(solutions[i,], i)
    cat("\n")
}

The code successfully ran:

Solution 1 
Fitness score: 2 

Subgraph 1 : 4 9 14 19 24 29 34 39 44 49 
Purple: 10 Orange: 0 

Subgraph 2 : 3 8 13 18 23 28 33 38 43 48 
Purple: 0 Orange: 10 

Subgraph 3 : 2 7 12 17 22 27 32 37 42 47 
Purple: 0 Orange: 10 

Subgraph 4 : 1 6 11 16 21 26 31 36 41 46 
Purple: 0 Orange: 10 

Subgraph 5 : 5 10 15 20 25 30 35 40 45 50 
Purple: 10 Orange: 0 


Solution 2 
Fitness score: 2 

Subgraph 1 : 5 10 15 20 25 30 35 40 45 50 
Purple: 10 Orange: 0 

Subgraph 2 : 2 7 12 17 22 27 32 37 42 47 
Purple: 0 Orange: 10 

Subgraph 3 : 4 9 14 19 24 29 34 39 44 49 
Purple: 10 Orange: 0 

Subgraph 4 : 1 6 11 16 21 26 31 36 41 46 
Purple: 0 Orange: 10 

Subgraph 5 : 3 8 13 18 23 28 33 38 43 48 
Purple: 0 Orange: 10 

But there are some major problems in the approach I used. For example:

enter image description here

The fitness function seems to be only exploring vertical partitions and can not find a single valid solution in which purple wins overall.

Can someone please show me how to correct these problems?


Related:


Solution

  • Disclaimer

    I am not familiar with GA, but I am suspicious of its feasibility in your question. Your question needs to consider the connectivity of each sub-group, which should be particularly designed with that specific constraint, instead of relying on the random optimization in GA.


    As said before, the answer below will not be a Genetic Algorithm approach, but it should work as you described in the question

    f <- function(rndSeed = 0) {
      set.seed(rndSeed)
      purplenodes <- as.character(c(outer(4:5, seq(0, by = 5, length.out = 10), `+`)))
      minsubgsz <- 5
      nrsubg <- 5
      g <- g %>%
        set_vertex_attr("name", value = seq.int(vcount(.)))
      repeat {
        gg <- g
        valid <- TRUE
        vlst <- setNames(vector("list", nrsubg), seq.int(nrsubg))
        szsubg <- rmultinom(1, vcount(g) - nrsubg * minsubgsz, rep(1, nrsubg)) + minsubgsz
        for (i in seq_along(szsubg)) {
          deg <- degree(gg)
          idx <- which(deg == min(deg))
          vlst[[i]] <- names(bfs(gg,
            # add more randomness for trials, and pick a new random root from vertices with the minimum degree
            V(gg)[idx[sample.int(length(idx), 1)]],
            callback = \(graph, data, extra) data["rank"] == szsubg[i]
          )$order)
          if (is_connected(induced_subgraph(gg, vlst[[i]]))) {
            gg <- induced_subgraph(gg, V(gg)[!names(V(gg)) %in% vlst[[i]]])
          } else {
            valid <- FALSE
            break
          }
        }
        if (valid) {
          purplewin <- sapply(vlst, \(x) mean(x %in% purplenodes)) > 0.5
          if (sum(purplewin) >= 0.5 * nrsubg) {
            break
          }
        }
      }
      
      # visualize the partitions
      g %>%
        set_vertex_attr("color",
          value = with(stack(vlst), ind[match(names(V(.)), values)])
        ) %>%
        plot(
          layout = layout,
          vertex.label = V(.)$name,
          vertex.label.color = "black",
          vertex.size = 15,
          edge.color = "gray",
          main = "Rectangular Undirected Network"
        )
    
      # return the desired clustering
      return(vlst)
    }
    

    where

    • rmultinom is used to generate random sizes of each group, where minsubgsz denotes the min sub-graph size
    • bfs is used to iterate through the connected vertices, while following the size of each group in szsubg
    • purplewin checks if the purple wins in each of group
    • The repeat loop keeps running the above process until it finds a partition where purple wins by least 3 out of 5 groups.

    Output Examples

    > f()
    $`1`
    [1] "5"  "4"  "10" "3"  "9"  "15" "2"
    
    $`2`
    [1] "1"  "6"  "7"  "11" "8"  "12" "16" "13"
    
    $`3`
     [1] "14" "19" "18" "20" "24" "17" "23" "25" "29" "22" "28"
    
    $`4`
     [1] "30" "35" "34" "40" "33" "39" "45" "32" "38" "44" "50" "27"
    
    $`5`
     [1] "49" "48" "43" "47" "42" "46" "37" "41" "36" "31" "26" "21"
    

    enter image description here

    and

    > f(2)
    $`1`
     [1] "50" "45" "49" "40" "44" "48" "35" "39" "43" "47"
    
    $`2`
     [1] "46" "41" "36" "42" "31" "37" "26" "32" "38" "21" "27" "33"
    
    $`3`
    [1] "34" "29" "24" "28" "30" "19"
    
    $`4`
    [1] "25" "20" "15" "10" "14" "5"  "9"
    
    $`5`
     [1] "4"  "3"  "2"  "8"  "1"  "7"  "13" "6"  "12" "18" "11" "17" "23" "16" "22"
    

    enter image description here