python r graph-theory mathematical-optimization lpsolve

Solving the assignment problem with specific constraints

Imagine the following data (the code to reproduce all the outputs is at the end):

df

           cars horsepower year safety
1        Toyota        140 2008      4
2      Chrysler        120 2009      4
3          Ford        140 2010      5
4           BMW        150 2008      3
5 Mercedes-Benz        150 2008      3
6       Hyundai        120 2009      4
7        Jaguar        150 2007      3
8         Tesla        120 2010      5

I'd like to swap the cars to get something like:

   cars_initial    cars_match horsepower year safety horsepowerMatch yearMatch safetyMatch
1        Toyota           BMW        140 2008      4             150      2008           3
2         Tesla      Chrysler        120 2010      5             120      2009           4
3 Mercedes-Benz          Ford        150 2008      3             140      2010           5
4        Jaguar       Hyundai        150 2007      3             120      2009           4
5       Hyundai        Jaguar        120 2009      4             150      2007           3
6          Ford Mercedes-Benz        140 2010      5             150      2008           3
7      Chrysler         Tesla        120 2009      4             120      2010           5
8           BMW        Toyota        150 2008      3             140      2008           4

Now this is a typical assignment problem that was in the case above solved randomly, i.e. by cost matrix set to 0 in all cases.

What I'm interested in are the outcomes. In the above case, the solution yields the following stats:

stats

  horsepower year safety
1       0.25 0.25      0

That is to say, 1/4 of swaps had an equal horsepower, etc.

Here is my question: How to solve such assignments by setting constraints on what exactly should be the outcome statistics directly, without the trial-and-error approach with setting the costs?

For instance, what if I would like to have a solution where safety has more than 0.20 match, and year at least 0.10, like below?

desiredOutput

   cars_initial    cars_match
1        Toyota      Chrysler
2         Tesla Mercedes-Benz
3 Mercedes-Benz           BMW
4        Jaguar        Toyota
5       Hyundai         Tesla
6          Ford       Hyundai
7      Chrysler        Jaguar
8           BMW          Ford

statsDesired

  horsepower year safety
1       0.25 0.12   0.25

Of course I could just set the cost matrix to a lower number in all cases where safety of cars is equal.

But is there a way to influence the result by directly setting the constraint on what should be the outcome statistics?

Perhaps there is a way to optimize the costs in order to arrive at the desired result?

The code:

library(lpSolve)
library(dplyr)
library(tidyr)

set.seed(1)

df <- data.frame(
  cars = c("Toyota", "Chrysler", "Ford", "BMW", "Mercedes-Benz", "Hyundai", "Jaguar", "Tesla"),
  horsepower = c(140, 120, 140, 150, 150, 120, 150, 120),
  year = c(2008, 2009, 2010, 2008, 2008, 2009, 2007, 2010),
  safety = c(4, 4, 5, 3, 3, 4, 3, 5)
)

mat <- df %>% select(cars) %>%
  crossing(df %>% select(cars)) %>%
  mutate(val = 0) %>% 
  spread(cars, val)

solved <- lp.assign(mat %>% select(-cars1) %>% as.matrix())$solution

matches <- as.data.frame(solved) %>%
  setNames(., names(mat %>% select(-cars1))) %>%
  bind_cols(mat %>% select(cars1)) %>%
  gather(key, val, -cars1) %>%
  filter(val == 1) %>% select(-val, cars_initial = cars1, cars_match = key)

nms <- c("cars", paste0(names(df %>% select(-cars)), "Match"))

matches <- matches %>%
  left_join(df, by = c("cars_initial" = "cars")) %>%
  left_join(df %>% setNames(., nms), by = c("cars_match" = "cars"))

stats <- matches %>%
  summarise(
    horsepower = round(sum(horsepower == horsepowerMatch) / n(), 2),
    year = round(sum(year == yearMatch) / n(), 2),
    safety = round(sum(safety == safetyMatch) / n(), 2)
  )

desiredOutput <- data.frame(cars_initial = matches$cars_initial, cars_match = c("Chrysler", "Mercedes-Benz", "BMW", "Toyota", "Tesla", "Hyundai", "Jaguar", "Ford"))

statsDesired <- desiredOutput %>%
  left_join(df, by = c("cars_initial" = "cars")) %>%
  left_join(df %>% setNames(., nms), by = c("cars_match" = "cars")) %>%
  summarise(
    horsepower = round(sum(horsepower == horsepowerMatch) / n(), 2),
    year = round(sum(year == yearMatch) / n(), 2),
    safety = round(sum(safety == safetyMatch) / n(), 2)
  )

I hope the examples above are sufficient, this is my first question so please let me know if I need to provide something more.

The code is in R, but I have also added the tag Python as I don't really mind the language of possible solutions.

Solution

Here is a partial formulation of this problem as an integer programming (IP) problem.

Let I be the set of car types. For car types i and j in I, let:

h[i,j] = 1 if cars i and j have the same horsepower
y[i,j] = 1 if cars i and j have the same year
and similarly for s[i,j] (safety)

These are parameters, meaning inputs to your model. (You'll need to write code to calculate these binary quantities based on your data table.)

Now introduce the following decision variables, i.e., variables that your IP model will choose values of:

x[i,j] = 1 if we assign car type j as type i's match

Now, normally an IP has an objective function that we want to minimize or maximize. In this case, there is no objective function -- you just want to find a set of matches that satisfies your constraints. So your objective function can just be:

minimize 0

Here is the first constraint. It says: At least a of the matches must have the same horsepower. (a is a fraction.) The left-hand side is the number of matches that have the same horsepower: For each pair of car types i and j, if j is assigned as i's match and they have the same horsepower, count a 1; otherwise, count a 0. The right-hand side is the number of matches you want, i.e., a fraction of the whole set.

subject to sum {i in I, j in I} h[i,j] * x[i,j] >= a * |I|

Now formulate similar constraints for the other categories.

Next, you need a constraint that says each car type i must be assigned to exactly one car type j:

subject to sum {j in I} x[i,j] == 1 for all i in I

Finally, you need constraints saying that the decision variables are binary:

subject to x[i,j] in {0,1} for all i, j in I

Now, in terms of solving this thing, you will need to either use a mathematical modeling language like AMPL or GAMS, or a package like PuLP for Python.

I hope this helps. I might have bitten off more than you can chew here.