Search code examples
roptimizationgenetic-algorithm

Genetic Algorithms (or optimization) in R


I am trying to explore the area of optimization using GAs in R. I realize that R is probably not the best place for me to do this but that is what I know right now.

So here is the problem: I have a set of people that I can pick to build my house most efficiently. I am willing to spend a maximum of 100,000 for the job (the limit beyond which the solution is no longer valid) and I HAVE to pick one electrician, two plumbers, two general construction workers, and two cleanup guys from a list of 500 or so (total 7 guys for the job).

Each of these people have a rate they will charge for the job and overall value they will provide towards the job at hand (the sum total value needs to be maximized). See the data here:

Data

I have looked at R packages genalg and DeOptim where you can use the binary functions to select people that will maximize the overall value for a given price BUT I cant figure out how to put a limit on what type of worker is selected. The solution can be all electricians and while they all provide great value, they cant complete the job. So basically I am looking for a way to control the population and none of these packages seem to allow that.

Does any have ideas on how i can solve this problem? I am sure many of you have looked at variations of this problem and have excellent insight. Thanks a lot in advance for your help.

Here is the code I have obtained from forums here:

library(genalg)

iter = 10
population = 500

# import csv file
brank = read.csv("GA_Data.csv")


dataset <- data.frame(item = brank$Person.ID, survivalpoints = brank$Value, weight = brank$Labor.Cost)


weightlimit <- 100000


monitor <- function(obj) {
  minEval = min(obj$evaluations);
  plot(obj$mean, obj$best, type="p", main = obj$iter); 
}

evalFunc <- function(x) {
  current_solution_survivalpoints <- x %*% dataset$survivalpoints
  current_solution_weight <- x %*% dataset$weight

  if (current_solution_weight > weightlimit) 
    return(0) else return(-current_solution_survivalpoints)
}


GAmodel <- rbga.bin(size = length(brank$Person.ID), popSize = population, iters = iter, mutationChance = 0.01, 
                    monitorFunc = monitor, elitism = T, zeroToOneRatio = 200, evalFunc = evalFunc)

cat(summary.rbga(GAmodel))

########################
# STOP and replace "space" with "comma" in a text file

solution = c(0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0)
# dataset[solution == 1, ]
brank[solution == 1, ]

sum(brank$Value[solution == 1])
sum(brank$Labor.Cost[solution == 1])

Solution

  • Your problem is a Linear Programming (LP) task and should be solved as such. R has several packages for linear programming, here I will use lpSolve.

    I put your data into a data frame D (299 workers).

    str(D)
    # 'data.frame': 299 obs. of  4 variables:
    #  $ ID   : int  1 2 3 4 5 6 7 8 9 10 ...
    #  $ Type : Factor w/ 4 levels "Cleanup Guy",..: 4 4 4 4 4 4 4 4 4 4 ...
    #  $ Cost : int  5100 3900 5000 4500 6700 5000 4000 3500 7500 4500 ...
    #  $ Value: int  25 18 23 20 29 21 17 15 32 19 ...
    
    # Prepare constraint matrix
    A <- matrix(0, nrow = 5, ncol = 299)
    A[1, c(1:27, 279:299)] <- 1     # Plumbers
    A[2, 28:97] <- 1                # Electricians
    A[3, 98:190] <- 1               # Const Workers
    A[4, 191:278] <- 1              # Cleanup
    A[5, ] <- D$Cost                # cost <= 100000
    
    # Prepare input for LP solver
    objective.in <- D$Value
    const.mat <- A
    const.dir <- c(">=", ">=", ">=", ">=", "<=")
    const.rhs <- c(1, 2, 2, 2, 100000)
    
    # Now solve the problem
    require(lpSolve)
    sol <- lp(direction = "max", objective.in, # maximize objective function
            const.mat, const.dir, const.rhs,   # constraints
            all.bin = TRUE)                    # use binary variables only
    
    # View the solution
    sol
    ## Success: the objective function is 589
    sum(D$Cost[inds])
    ## 100000
    inds <- which(sol$solution == 1)
    D[inds, ]
    ##      ID         Type Cost Value
    ## 1     1      Plumber 5100    25
    ## 51   51  Electrician 5000    24
    ## 54   54  Electrician 3500    16
    ## 101 101 Const Worker 3500    18
    ## 102 102 Const Worker 4400    21
    ## 103 103 Const Worker 5800    27
    ## 147 147 Const Worker 5600    29
    ## 149 149 Const Worker 3700    17
    ## 196 196  Cleanup Guy 6200    30
    ## 197 197  Cleanup Guy 4200    20
    ## 256 256  Cleanup Guy 4000    27
    ## 263 263  Cleanup Guy 3800    22
    ## 264 264  Cleanup Guy 3800    33
    ## 266 266  Cleanup Guy 3700    23
    ## 267 267  Cleanup Guy 4200    30
    ## 273 273  Cleanup Guy 3600    35
    ## 274 274  Cleanup Guy 5900    28
    ## 275 275  Cleanup Guy 3700    28
    ## 282 282      Plumber 7000    37
    ## 287 287      Plumber 3800    30
    ## 297 297      Plumber 3600    37
    ## 298 298      Plumber 5900    32
    

    For explanation of the inputs see the help page of the lp() function.