Search code examples
ralgorithmconstraintscombinatoricssubset-sum

Counting the Number of Combinations that Match a Certain Condition


Here is my problem:

  • There are 5 candies (1,2,3,4,5) - each candy has the same "value" associated with its number (i.e. candy_1 has value=1, candy_2 has value=2, etc.).

  • In the future, I want to be able to answer questions such as: Is it possible to pick "x" number of candies such that the values of the chosen candies sum to less than "y" and the values of the unchosen candies sum to less than "z"?

The first way to solve this problem that comes to mind is to use the "power set" (https://en.wikipedia.org/wiki/Power_set). That is, if there are "n" candies, the total number of combinations that can be made are 2^n combinations.

Step 1: For example, suppose I have 5 candies - here is a function that will identify the power set:

a <- c(1,2,3,4,5)

power_set <- function(x) {
    c(list(NULL), unlist(lapply(seq_along(x), function(i) combn(x, i, simplify = FALSE)), recursive = FALSE), list(x))
}
  
pset <- power_set(a)

Step 2: Then, for each element in the power set, I need to identify the "unchosen" candies:

find_missing <- function(x, full_set) {
  full_set[!full_set %in% x]
}

diffset <- lapply(pset, find_missing, full_set = a)

Step 3: Finally, I can use some data manipulation to bring everything into a data frame and summarize different properties of the selections (e.g. sums and counts):

pset_char <- sapply(pset, function(x) paste(x, collapse = ","))
diffset_char <- sapply(diffset, function(x) paste(x, collapse = ","))

df <- data.frame(pset = pset_char, diff = diffset_char)

df$sum_pset <- sapply(pset, sum)
df$sum_diffset <- sapply(diffset, sum)
df$count_pset <- sapply(pset, length)
df$count_diffset <- sapply(diffset, length)

The final result would look something like this:

  pset      diff sum_pset sum_diffset count_pset count_diffset
1      1,2,3,4,5        0          15          0             5
2    1   2,3,4,5        1          14          1             4
3    2   1,3,4,5        2          13          1             4
4    3   1,2,4,5        3          12          1             4
5    4   1,2,3,5        4          11          1             4
6    5   1,2,3,4        5          10          1             4

From here, I could then simply select rows that match a certain criteria (e.g. select from df where count_pset <= x & sum_pset <=y & sum_diffset <=z;)

My Question: The obvious problem with this approach is that the Power Set grows very large as the number of elements increase and this approach will become completely impractical.

Suppose I have 1000 candies (e.g. candy_1 has value = 1,... candy_999 has value = 999, candy_1000 has value = 100) and I am only interested in finding some limited number of combinations that match a certain criteria (i.e. not all of them).

Is there some other approach I can use to avoid generating the entire power set and then searching the results? For example, is there something involving recursion, backtracking, and dynamic programming that can be done to make this problem solvable? Perhaps Evolutionary Algorithms?


Solution

  • From you description, it seems you would like to enumerate all possible subset combinations that satisfies some sum constraints. I think this a variant of the "subset sum" problem, and what you want is to list all the feasible subsets

    Code

    Below is a function hopefully might solve your problem, where a recursion helper function is devised to find the constrained subsets

    f <- function(a, x, y, z) {
      helper <- function(vec, s, k) {
        v <- vec[vec <= s]
        if (k == 1) {
          return(as.list(v))
        }
        unlist(
          lapply(v, \(p) Map(c, p, helper(v[v > p], s - p, k - 1))),
          recursive = FALSE
        )
      }
    
      Filter(
        \(v) sum(a) - sum(v) <= z,
        unlist(
          lapply(
            0:x,
            \(k) helper(a, y, k)
          ),
          recursive = FALSE
        )
      )
    }
    

    Output Example

    Given input as below

    a <- 1:5
    x <- 3
    y <- 12
    z <- 10
    

    you will obtain the all feasible subsets

    > f(a, x, y, z)
    [[1]]
    [1] 5
    
    [[2]]
    [1] 1 4
    
    [[3]]
    [1] 1 5
    
    [[4]]
    [1] 2 3
    
    [[5]]
    [1] 2 4
    
    [[6]]
    [1] 2 5
    
    [[7]]
    [1] 3 4
    
    [[8]]
    [1] 3 5
    
    [[9]]
    [1] 4 5
    
    [[10]]
    [1] 1 2 3
    
    [[11]]
    [1] 1 2 4
    
    [[12]]
    [1] 1 2 5
    
    [[13]]
    [1] 1 3 4
    
    [[14]]
    [1] 1 3 5
    
    [[15]]
    [1] 1 4 5
    
    [[16]]
    [1] 2 3 4
    
    [[17]]
    [1] 2 3 5
    
    [[18]]
    [1] 2 4 5
    
    [[19]]
    [1] 3 4 5
    

    Bonus

    Further, if you want the output presented in a data.frame, you can run the following steps in addition

    lst <- f(a, x, y, z)
    dfout <- do.call(
      rbind,
      lapply(
        lst,
        \(v)
        data.frame(
          pset = toString(v),
          diffset = toString(setdiff(a, v)),
          sum_pset = sum(v),
          sum_diffset = sum(a) - sum(v),
          cnt_pset = length(v),
          cnt_diffset = length(a) - length(v)
        )
      )
    )
    

    which gives

    > dfout
          pset    diffset sum_pset sum_diffset cnt_pset cnt_diffset
    1        5 1, 2, 3, 4        5          10        1           4
    2     1, 4    2, 3, 5        5          10        2           3
    3     1, 5    2, 3, 4        6           9        2           3
    4     2, 3    1, 4, 5        5          10        2           3
    5     2, 4    1, 3, 5        6           9        2           3
    6     2, 5    1, 3, 4        7           8        2           3
    7     3, 4    1, 2, 5        7           8        2           3
    8     3, 5    1, 2, 4        8           7        2           3
    9     4, 5    1, 2, 3        9           6        2           3
    10 1, 2, 3       4, 5        6           9        3           2
    11 1, 2, 4       3, 5        7           8        3           2
    12 1, 2, 5       3, 4        8           7        3           2
    13 1, 3, 4       2, 5        8           7        3           2
    14 1, 3, 5       2, 4        9           6        3           2
    15 1, 4, 5       2, 3       10           5        3           2
    16 2, 3, 4       1, 5        9           6        3           2
    17 2, 3, 5       1, 4       10           5        3           2
    18 2, 4, 5       1, 3       11           4        3           2
    19 3, 4, 5       1, 2       12           3        3           2