Search code examples
algorithmdata-structuresdynamic-programmingsubset-sumclrs

maximum sum of a subset of size K with sum less than M


Given: array of integers value K,M

Question: Find the maximum sum which we can obtain from all K element subsets of given array such that sum is less than value M?

is there a non dynamic programming solution available to this problem? or if it is only dp[i][j][k] can only solve this type of problem! can you please explain the algorithm.


Solution

  • Many people have commented correctly that the answer below from years ago, which uses dynamic programming, incorrectly encodes solutions allowing an element of the array to appear in a "subset" multiple times. Luckily there is still hope for a DP based approach.

    Let dp[i][j][k] = true if there exists a size k subset of the first i elements of the input array summing up to j

    Our base case is dp[0][0][0] = true

    Now, either the size k subset of the first i elements uses a[i + 1], or it does not, giving the recurrence

    dp[i + 1][j][k] = dp[i][j - a[i + 1]][k - 1] OR dp[i][j][k]

    Put everything together:

    given A[1...N]
    initialize dp[0...N][0...M][0...K] to false
    dp[0][0][0] = true
    for i = 0 to N - 1:
        for j = 0 to M:
            for k = 0 to K:
                if dp[i][j][k]:
                    dp[i + 1][j][k] = true
                if j >= A[i] and k >= 1 and dp[i][j - A[i + 1]][k - 1]:
                    dp[i + 1][j][k] = true
    max_sum = 0
    for j = 0 to M:
        if dp[N][j][K]:
            max_sum = j
    return max_sum
    

    giving O(NMK) time and space complexity.

    Stepping back, we've made one assumption here implicitly which is that A[1...i] are all non-negative. With negative numbers, initializing the second dimension 0...M is not correct. Consider a size K subset made up of a size K - 1 subset with sum exceeding M and one other sufficiently negative element of A[] such that overall sum no longer exceeds M. Similarly, our size K - 1 subset could sum to some extremely negative number and then with a sufficiently positive element of A[] sum to M. In order for our algorithm to still work in both cases we would need to increase the second dimension from M to the difference between the sum of all positive elements in A[] and the sum of all negative elements (the sum of the absolute values of all elements in A[]).

    As for whether a non dynamic programming solution exists, certainly there is the naive exponential time brute force solution and variations that optimize the constant factor in the exponent.

    Beyond that? Well your problem is closely related to subset sum and the literature for the big name NP complete problems is rather extensive. And as a general principle algorithms can come in all shapes and sizes -- it's not impossible for me to imagine doing say, randomization, approximation, (just choose the error parameter to be sufficiently small!) plain old reductions to other NP complete problems (convert your problem into a giant boolean circuit and run a SAT solver). Yes these are different algorithms. Are they faster than a dynamic programming solution? Some of them, probably. Are they as simple to understand or implement, without say training beyond standard introduction to algorithms material? Probably not.

    This is a variant of the Knapsack or subset-problem, where in terms of time (at the cost of exponential growing space requirements as the input size grows), dynamic programming is the most efficient method that CORRECTLY solves this problem. See Is this variant of the subset sum problem easier to solve? for a similar question to yours.

    However, since your problem is not exactly the same, I'll provide an explanation anyways. Let dp[i][j] = true, if there is a subset of length i that sums to j and false if there isn't. The idea is that dp[][] will encode the sums of all possible subsets for every possible length. We can then simply find the largest j <= M such that dp[K][j] is true. Our base case dp[0][0] = true because we can always make a subset that sums to 0 by picking one of size 0.

    The recurrence is also fairly straightforward. Suppose we've calculated the values of dp[][] using the first n values of the array. To find all possible subsets of the first n+1 values of the array, we can simply take the n+1_th value and add it to all the subsets we've seen before. More concretely, we have the following code:

    initialize dp[0..K][0..M] to false
    dp[0][0] = true
    for i = 0 to N:
        for s = 0 to K - 1:
            for j = M to 0:
                if dp[s][j] && A[i] + j < M:
                    dp[s + 1][j + A[i]] = true
    for j = M to 0:
        if dp[K][j]:
            print j
            break