Search code examples
roptimizationlinear-programminglpsolve

How to set up linear programming optimization in R using LpSolve?


For example, I have this sample data:

d=data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))

Which looks like this:

  x y w
1 1 5 1
2 1 6 2
3 1 7 3
4 2 8 4
5 2 7 5
6 3 5 6
7 4 6 7
8 4 5 8

x and y represent indices from datax and datay. w represents a score from comparing datax[x] with datay[y]. I want to maximize the total score (or w) from d, where each value of x is matched to at most one value of y, and vice versa.

The result should look like this:

  x y w
1 2 7 5
2 3 5 6
3 4 6 7

Where the sum of all w values is maximized, and each x and each y show up only once in the result.

How do I set this problem up in the lpSolve::lp function?


Solution

  • You can use lpSolveAPI to solve your problem. Your stated solution is not quite feasible given your constraints. So let's go with you wanting X's and Y's to not repeat in the solution.

    You will need 8 new binary variables. Each variables specifies whether or not that row in d is getting picked (1) or dropped (0).

    Update based on OP's request

    Yes, the lpSolveAPI code (below) makes it look more complicated than it really is. This LP formulation (output of lpSolveAPI) should make things clearer:

    /* Objective function */
    max: +pick_1 +2 pick_2 +3 pick_3 +4 pick_4 +5 pick_5 +6 pick_6 +7 pick_7 +8 pick_8;
    
    /* Constraints */
    OneX_1: +pick_1 +pick_2 +pick_3 <= 1;
    OneX_2: +pick_4 +pick_5 <= 1;
    OneX_4: +pick_7 +pick_8 <= 1;
    OneY_5: +pick_1 +pick_6 +pick_8 <= 1;
    OneY_6: +pick_2 +pick_7 <= 1;
    OneY_7: +pick_3 +pick_5 <= 1;
    
    /* Variable bounds */
    pick_1 <= 1;
    pick_2 <= 1;
    pick_3 <= 1;
    pick_4 <= 1;
    pick_5 <= 1;
    pick_6 <= 1;
    pick_7 <= 1;
    pick_8 <= 1;
    

    Explanation: The second constraint (OneX_2) simply states that only one of pick_4 or pick_5 can be 1, since the 4th and 5th rows in the data frame d have X = 2

    Solution

    Note that the formulation above produces an optimal solution that selects 4 rows in the data frame d

    > d[c(3,4,6,7),]
      x y w
    3 1 7 3
    4 2 8 4
    6 3 5 6
    7 4 6 7
    

    The sum of w's is 20, which is better than the solution in the question.

    Code

    library(lpSolveAPI)
    d <- data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))
    
    ncol <- 8 #you have eight rows that can be picked or dropped from the solution set
    lp_rowpicker <- make.lp(ncol=ncol)
    set.type(lp_rowpicker, columns=1:ncol, type = c("binary"))
    
    obj_vals <- d[, "w"]
    set.objfn(lp_rowpicker, obj_vals) 
    lp.control(lp_rowpicker,sense='max')
    
    #Add constraints to limit X values from repeating
    add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
                   indices=c(1,2,3), rhs=1, type="<=")
    add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
                   indices=c(4,5), rhs=1, type="<=")
    add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
                   indices=c(7,8), rhs=1, type="<=") #x's in dataframe rows 7 & 8 are both '4'
    
    #Add constraints to limit Y values from repeating
    add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
                   indices=c(1,6,8), rhs=1, type="<=") #Y's in df rows 1,6 & 8 are all '5'
    add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
                   indices=c(2,7), rhs=1, type="<=") #Y's in dataframe rows 2&7 are both '6'
    add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
                   indices=c(3,5), rhs=1, type="<=") #y's in dataframe rows 3&5 are both '7'
    
    solve(lp_rowpicker)
    get.objective(lp_rowpicker) #20
    get.variables(lp_rowpicker)
    #[1] 0 0 1 1 0 1 1 0
    #This tells you that from d you pick rows: 3,4,6 & 7 in your optimal solution.
    
    #If you want to look at the full formulation:
    rownames1 <- paste("OneX", c(1,2,4), sep="_")
    rownames2 <- paste("OneY", c(5,6,7), sep="_")
    colnames<- paste("pick_",c(1:8), sep="")
    dimnames(lp_rowpicker) <- list(c(rownames1, rownames2), colnames)
    print(lp_rowpicker)
    
    #write it to a text file
    write.lp(lp_rowpicker,filename="max_w.lp")
    

    Hope this gives you an idea of how to use lpSolveAPI to formulate your problem.