I found this picture online:
I am trying to use simulate the above partitioning process in R using evolutionary algorithms.
For example, suppose I have a similar graph network:
library(igraph)
n_rows <- 10
n_cols <- 5
g <- make_lattice(dimvector = c(n_cols, n_rows))
layout <- layout_on_grid(g, width = n_cols)
n_nodes <- vcount(g)
node_colors <- rep("white", n_nodes)
for (row in 0:(n_rows-1)) {
start_index <- row * n_cols + 1
node_colors[start_index:(start_index+2)] <- "orange"
node_colors[(start_index+3):(start_index+4)] <- "purple"
}
node_labels <- 1:n_nodes
plot(g,
layout = layout,
vertex.color = node_colors,
vertex.label = node_labels,
vertex.label.color = "black",
vertex.size = 15,
edge.color = "gray",
main = "Rectangular Undirected Network")
I would like to use an evolutionary algorithm to partition this graph network into 5 induced subgraphs such that each subgraph has at least 5 nodes and purple "wins" in the majority of all subgraphs. I would like to identify different such options (e.g. 5th option in the first photo) in which purple wins and each subgraph has at least 5 nodes.
Here is the approach I am currently using:
library(igraph)
library(GA)
library(ggplot2)
library(dplyr)
library(gridExtra)
# original graph
n_rows <- 10
n_cols <- 5
g <- make_lattice(dimvector = c(n_cols, n_rows))
n_nodes <- vcount(g)
node_colors <- rep("white", n_nodes)
for (row in 0:(n_rows-1)) {
start_index <- row * n_cols + 1
node_colors[start_index:(start_index+2)] <- "orange"
node_colors[(start_index+3):(start_index+4)] <- "purple"
}
# define fitness function based on constraints
fitness <- function(solution) {
subgraphs <- split(1:n_nodes, solution)
# check if all subgraphs have at least 5 nodes
if (any(sapply(subgraphs, length) < 5)) {
return(-Inf)
}
# count purple wins
purple_wins <- sum(sapply(subgraphs, function(sg) {
sum(node_colors[sg] == "purple") > sum(node_colors[sg] == "orange")
}))
return(purple_wins)
}
# genetic Algorithm
ga_result <- ga(
type = "permutation",
fitness = fitness,
min = 1,
max = 5,
popSize = 50,
maxiter = 1000,
run = 100,
pmutation = 0.2,
monitor = FALSE,
keepBest = TRUE
)
# multiple solutions
n_solutions <- 3 # Number of solutions to display
solutions <- ga_result@solution[1:n_solutions,]
for (i in 1:n_solutions) {
cat("Solution", i, "\n")
cat("Fitness score:", ga_result@fitness[i], "\n\n")
plot_subgraphs(solutions[i,], i)
cat("\n")
}
The code successfully ran:
Solution 1
Fitness score: 2
Subgraph 1 : 4 9 14 19 24 29 34 39 44 49
Purple: 10 Orange: 0
Subgraph 2 : 3 8 13 18 23 28 33 38 43 48
Purple: 0 Orange: 10
Subgraph 3 : 2 7 12 17 22 27 32 37 42 47
Purple: 0 Orange: 10
Subgraph 4 : 1 6 11 16 21 26 31 36 41 46
Purple: 0 Orange: 10
Subgraph 5 : 5 10 15 20 25 30 35 40 45 50
Purple: 10 Orange: 0
Solution 2
Fitness score: 2
Subgraph 1 : 5 10 15 20 25 30 35 40 45 50
Purple: 10 Orange: 0
Subgraph 2 : 2 7 12 17 22 27 32 37 42 47
Purple: 0 Orange: 10
Subgraph 3 : 4 9 14 19 24 29 34 39 44 49
Purple: 10 Orange: 0
Subgraph 4 : 1 6 11 16 21 26 31 36 41 46
Purple: 0 Orange: 10
Subgraph 5 : 3 8 13 18 23 28 33 38 43 48
Purple: 0 Orange: 10
But there are some major problems in the approach I used. For example:
The fitness function seems to be only exploring vertical partitions and can not find a single valid solution in which purple wins overall.
Can someone please show me how to correct these problems?
Related:
I am not familiar with GA, but I am suspicious of its feasibility in your question. Your question needs to consider the connectivity of each sub-group, which should be particularly designed with that specific constraint, instead of relying on the random optimization in GA.
As said before, the answer below will not be a Genetic Algorithm approach, but it should work as you described in the question
f <- function(rndSeed = 0) {
set.seed(rndSeed)
purplenodes <- as.character(c(outer(4:5, seq(0, by = 5, length.out = 10), `+`)))
minsubgsz <- 5
nrsubg <- 5
g <- g %>%
set_vertex_attr("name", value = seq.int(vcount(.)))
repeat {
gg <- g
valid <- TRUE
vlst <- setNames(vector("list", nrsubg), seq.int(nrsubg))
szsubg <- rmultinom(1, vcount(g) - nrsubg * minsubgsz, rep(1, nrsubg)) + minsubgsz
for (i in seq_along(szsubg)) {
deg <- degree(gg)
idx <- which(deg == min(deg))
vlst[[i]] <- names(bfs(gg,
# add more randomness for trials, and pick a new random root from vertices with the minimum degree
V(gg)[idx[sample.int(length(idx), 1)]],
callback = \(graph, data, extra) data["rank"] == szsubg[i]
)$order)
if (is_connected(induced_subgraph(gg, vlst[[i]]))) {
gg <- induced_subgraph(gg, V(gg)[!names(V(gg)) %in% vlst[[i]]])
} else {
valid <- FALSE
break
}
}
if (valid) {
purplewin <- sapply(vlst, \(x) mean(x %in% purplenodes)) > 0.5
if (sum(purplewin) >= 0.5 * nrsubg) {
break
}
}
}
# visualize the partitions
g %>%
set_vertex_attr("color",
value = with(stack(vlst), ind[match(names(V(.)), values)])
) %>%
plot(
layout = layout,
vertex.label = V(.)$name,
vertex.label.color = "black",
vertex.size = 15,
edge.color = "gray",
main = "Rectangular Undirected Network"
)
# return the desired clustering
return(vlst)
}
where
rmultinom
is used to generate random sizes of each group, where minsubgsz
denotes the min sub-graph sizebfs
is used to iterate through the connected vertices, while following the size of each group in szsubg
purplewin
checks if the purple wins in each of grouprepeat
loop keeps running the above process until it finds a partition where purple wins by least 3 out of 5 groups.> f()
$`1`
[1] "5" "4" "10" "3" "9" "15" "2"
$`2`
[1] "1" "6" "7" "11" "8" "12" "16" "13"
$`3`
[1] "14" "19" "18" "20" "24" "17" "23" "25" "29" "22" "28"
$`4`
[1] "30" "35" "34" "40" "33" "39" "45" "32" "38" "44" "50" "27"
$`5`
[1] "49" "48" "43" "47" "42" "46" "37" "41" "36" "31" "26" "21"
and
> f(2)
$`1`
[1] "50" "45" "49" "40" "44" "48" "35" "39" "43" "47"
$`2`
[1] "46" "41" "36" "42" "31" "37" "26" "32" "38" "21" "27" "33"
$`3`
[1] "34" "29" "24" "28" "30" "19"
$`4`
[1] "25" "20" "15" "10" "14" "5" "9"
$`5`
[1] "4" "3" "2" "8" "1" "7" "13" "6" "12" "18" "11" "17" "23" "16" "22"