Search code examples
rggplot2mathematical-optimizationminimizemaximize

Optimization in R: Maximizing and Minimizing Many Variables


I have a dataset with 70 foods and information about each food's nutritional value (protein/oz., fat/oz., cals/oz., etc.), as well as the food's cost/oz. I am trying to figure out--given a set budget in $--what the best combination of foods (and the amt. of each food) would be to maximize protein, minimize fat, minimize calories, etc. I aim to do this across a series of price points, and to plot each.

I found a whole bunch of different packages that could help with this here: http://cran.r-project.org/web/views/Optimization.html. However, I'm a beginner and am not sure what would be most helpful/where to start - would love some suggestions from anyone familiar with solving these kinds of optimization problems.


Solution

  • This is called the diet problem and it is a popular introduction to linear programming (see, e.g., the first Google hit I found for the diet problem). Linear programming solvers through packages such as lpSolve can be used to solve many variants of the diet problem.

    For instance, consider the version of the problem in the link above, where you have the following foods to choose from:

    (food <- data.frame(Food=c("Corn", "2% Milk", "Wheat Bread"), CostPerServing=c(.18, .23, .05), VitaminA=c(107, 500, 0), Calories=c(72, 121, 65)))
    #          Food CostPerServing VitaminA Calories
    # 1        Corn           0.18      107       72
    # 2     2% Milk           0.23      500      121
    # 3 Wheat Bread           0.05        0       65
    

    Assume you wanted to wanted to find the number of servings of each food that minimize total cost subject to the constraint that the total calories must be between 2000 and 2500 and the amount of Vitamin A must be between 5000 and 50000. If you defined variables X1, X2, and X2, then your objective is .18*X1 + .23*X2 + .05*X3, a linear function of the variables. Similarly each of your constraints in a linear function of the variables; for instance, the lower bound on the number of calories is a constraint of the form 72*X1 + 121*X2 + 65*X3 >= 2000.

    The lp function from the lpSolve package takes as input a vector indicating the coefficients in the objective value, and information about constraints (a constraint matrix, the direction of each constraint, and the right-hand side of each constraint). For the stated problem, this would be:

    library(lpSolve)
    mod <- lp("min",  # min/max
              food$CostPerServing,  # Objective
              rbind(food$VitaminA, food$VitaminA, food$Calories, food$Calories),  # Constraint matrix
              c(">=", "<=", ">=", "<="),  # Constraint directions
              c(5000, 50000, 2000, 2500))
    

    After the model has solved, you can look at the objective function and values:

    mod$objval
    # [1] 2.907692
    mod$solution
    # [1]  0.00000 10.00000 12.15385
    sum(food$VitaminA * mod$solution)
    # [1] 5000
    sum(food$Calories * mod$solution)
    # [1] 2000
    

    The cheapest cost to meet the constraints would be $2.91, and you can achieve this by using 0 servings of corn, 10 servings of 2% milk, and 12.15 servings of wheat bread. This yields exactly 5000 units of Vitamin A and exactly 2000 calories.