Search code examples
c++arraysalgorithmmultidimensional-arrayknapsack-problem

Multidimensional Knapsack with minimum and maximum


I have a problem which is similar to the Knapsack problem, more specifically the multidimensional variation.

I have a bunch of objects which all have a cost, a value, and a category. I need to the Knapsack optimisation for value under a maximum cost, but also have a specific number of objects in each category.

I have successfully implemented in C++ the original knapsack algorithm, without paying attention to the categories.

When I tried to add the categories, I figured out that I could simply treat this as a multidimensional knapsack problem, which each categories being a weight of either 0 or 1 in a new dimension.

My main problem is that I do not only have a maximum, ex: 5 objects of type food, but also a minimum, since I need exactly 5 objects of type food.

And I can't figure out how to add a minimum into the algorithm.

Obviously, I can use a general case, where every dimension has a maximum and minimum, and optimise for total, since all my dimensions but one only have a range of 1, so this would end up optimising for value anyway. Furthermore, I can set the minimum for value to zero, to avoid having one dimension without a minimum, and it would still work.

I'm working in C++, but honestly even pseudo-code would be fine, I just need the algorithm.

Obviously I also need it to be fast, if possible as fast as the multidimensional variation.

Here is an example of the test case. As this is mostly an optimization problem, the instance is huge, but it should work on any instance size. The number of possible categories and number of category fields is fixed.

You have a backpack that can hold a maximum of 100 units of weight, and a list of 1000 objects, each object having a value, a weight and a type. You specifically need to bring exactly 10 objects of type food, 15 objects of type clothing and 5 Tools. Every object has a completely arbitrary (but greater than 0) value in dollars, and weight in units. I would need to find the optimal configuration for value respecting the maximum weight and the specific number of each type of items.

The list of objects will always contain at least one valid configuration, which means that it will always have at least enough objects of every type that will end up Under the maximum weight, so I don't have to plan for the "no answer" case. I just have to find the best answer for a (probably) huge number of available items.


Solution

  • Knowing exactly how many items can be selected from each category is a big constraint. Consider the simplest case that there is one category. You much choose exactly N objects to maximize the value sum[v_i x_i] for a cost sum[w_i x_i] < W, where x_i is equal to 0 or 1 (following Wikipedia's notation). The new constraint is that sum[x_i] = N. This constraint can be included in the problem by adding another dimension in the dynamic programming, but explicitly checking whether or not a solution is valid and has exactly the number of required elements.

    Vanilla knapsack problem

    Here is a brief demonstration: Take as the starting point this solution to the standard 0/1 knapsack problem via memoization:

    #include <cstdio>
    #include <iostream>
    #include <vector>
    #include <map>
    #include <algorithm>
    
    using uint = unsigned int;
    
    template <typename T>
    struct item {
        T value;
        uint weight;
    };
    
    template <typename T>
    T knapSack(uint W, const std::vector< item<T> >& items) {
        
        std::map< std::pair<uint, uint>, T> cache;
        
        std::function<T(uint, uint)> recursion;
        recursion = [&] (uint n, uint w) {
            if (n == 0)
                return 0;
            auto it = cache.find(std::make_pair(n,w));
            if (it != cache.end())
                return it->second;
            T _v = items[n-1].value;
            uint _w = items[n-1].weight;
            T nextv;
            if (_w <= w)
                nextv = std::max(_v + recursion(n-1,w-_w),recursion(n-1,w));
            else
                nextv = recursion(n-1,w);
            cache.insert(std::make_pair(std::make_pair(n,w),nextv));
            return nextv;   
        };
    
        return recursion(items.size(),W);
    }
    

    My implementation here (with a recursive lambda function) emphasizes readability over optimality. A choice of objects with indices < N and sum of weights < W is either a choice of objects with indices < N-1 and sum of weights < W or the object at N-1 together with objects with indices < N-1 and sum of weights < W - w[N-1].

    Knapsack problem with one category with fixed required number of objects

    We can go on to add a new restriction that keeps track of the number of elements selected. We'll do this by noting that the new choice of objects at each recursive step has 0 or 1 more elements than the one before it, in the same way that it has the same or greater sum of weights -- i.e., a choice of K objects with indices < N and sum of weights < W is either a choice of K objects with indices < N-1 and sum of weights < W or the object at N-1 together with K-1 objects with indices < N-1 and sum of weights < W - w[N-1]. However, we'll also want to track violations -- for instance, we can't find K objects with indices < N when K>N. In such a case we should report that the maximum possible value is 0, because the choice is impossible, but we should mark this as "invalid" to differentiate this from the the trivial base case of the recursion. Moreover, any solution higher up the chain that attempts to use this as a sub-solution should also be marked as invalid. For this reason, we change the return type from a simple value to a pair of a value and boolean. As part of the base case, we mark all entries with K>N as having a maximum value of 0 but as invalid:

    template <typename T>
    std::pair<T,bool> knapSackConstrained(uint W, uint K, const std::vector< item<T> >& items) {
        
        std::map< std::tuple<uint, uint, uint>, std::pair<T,bool> > cache;
        
        std::function<std::pair<T, bool>(uint, uint, uint)> recursion;
        recursion = [&] (uint n, uint w, uint k) {
            if (k > n)
                return std::make_pair(0,false);
            if (n == 0 || k == 0)
                return std::make_pair(0,true);
            auto it = cache.find(std::make_tuple(n,w,k));
            if (it != cache.end())
                return it->second;
            T _v = items[n-1].value;
            uint _w = items[n-1].weight;
            T nextv;
            bool nextvalid = true;
            if (_w <= w) {
                auto take = recursion(n-1,w-_w,k-1);
                auto reject = recursion(n-1,w,k);
                if (take.second and reject.second) {
                    nextv = std::max(_v + take.first,reject.first);
                } else if (take.second) {
                    nextv = _v + take.first;
                } else if (reject.second) {
                    nextv = reject.first;
                } else {
                    nextv = 0;
                    nextvalid = false;
                }   
            } else {
                std::tie(nextv,nextvalid) = recursion(n-1,w,k);
            }
            std::pair<T,bool> p = std::make_pair(nextv,nextvalid);
            cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
            return p;   
        };
    
        return recursion(items.size(),W,K);
    }
    

    Here is a simple main routine running this code and its output:

    int main(int argc, char *argv[]) {
        std::vector< item<int> > items = {{60,10},{10,6},{10,6}};
        int  j = 13;
        std::cout << "Unconstrained: " << knapSack(j,items) << std::endl;
        for (uint k = 1; k <= items.size(); ++k) {
            auto p = knapSackConstrained(j,k,items);
            std::cout << "K = " << k << ": " << p.first;
            if (p.second)
                std::cout << std::endl;
            else
                std::cout << ", no valid solution" << std::endl;
        }
        return 0;
    }
    
    % OUTPUT %
    Unconstrained: 60
    K = 1: 60
    K = 2: 20
    K = 3: 0, no valid solution
    

    Since the sum of the 3 weights is already greater than the threshold, the solution requiring all three is impossible.

    Knapsack problem with multiple categories with fixed required number of objects

    The above only partly addresses your problem, since you have multiple categories rather than simply one. However, I believe that this can be extended to be multi-dimensional without too much extra work. Indeed, I suspect the following code is the right strategy for the multi-dimensional case, modulo bugs -- it would require some good test cases for verification. The single argument K is replaced with a vector of category numbers, and the item struct is given a category field. The base case must consider each possible K>N case (for each category), and in addition the check that the (i-1)st weight is less than W must be extended to check that there is at least 1 item more required of the (i-1)st category.

    #include <cstdio>
    #include <iostream>
    #include <vector>
    #include <map>
    #include <algorithm>
    
    using uint = unsigned int;
    
    template <typename T>
    struct item {
        T value;
        uint weight;
        uint category;
    };
    
    template <typename T>
    std::pair<T,bool> knapSack(uint W, const std::vector<uint>& K, const std::vector< item<T> >& items) {
    
        std::map< std::tuple<uint, uint, std::vector<uint> >, std::pair<T,bool> > cache;
        
        std::function<std::pair<T, bool>(uint, uint, std::vector<uint>)> recursion;
        recursion = [&] (uint n, uint w, std::vector<uint> k) {
            
            auto it = cache.find(std::make_tuple(n,w,k));
            if (it != cache.end())
                return it->second;
            
            std::vector<uint> ccount(K.size(),0);
            for (uint c = 0; c < K.size(); ++c) {
                for (uint i = 0; i < n; ++i) {
                    if (items[i].category == c)
                        ++ccount[c];
                }
            }
            for (uint c = 0; c < k.size(); ++c) {
                if (k[c] > ccount[c]) {
                    auto p = std::make_pair(0,false);
                    cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
                    return p;
                }
            }
    
            uint sumk = 0; for (const auto& _k : k) sumk += _k;
            if (n == 0 || sumk == 0) {
                auto p = std::make_pair(0,true);
                cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
                return p;
            }
    
            T _v = items[n-1].value;
            uint _w = items[n-1].weight;
            uint _c = items[n-1].category;
            T nextv;
            bool nextvalid = true;
            if (_w <= w and k[_c] > 0) {
                std::vector<uint> subk = k;
                --subk[_c];
                auto take = recursion(n-1,w-_w,subk);
                auto reject = recursion(n-1,w,k);
                if (take.second and reject.second) {
                    nextv = std::max(_v + take.first,reject.first);
                } else if (take.second) {
                    nextv = _v + take.first;
                } else if (reject.second) {
                    nextv = reject.first;
                } else {
                    nextv = 0;
                    nextvalid = false;
                }   
            } else {
                std::tie(nextv,nextvalid) = recursion(n-1,w,k);
            }
            std::pair<T,bool> p = std::make_pair(nextv,nextvalid);
            cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
            return p;   
        };
    
        return recursion(items.size(),W,K);
    }
     
    int main(int argc, char *argv[]) {
        std::vector< item<int> > items = {{60,10,0}, {100,20,1}, {120,30,0}, {140,35,1}, {145,40,0}, {180,45,1}, {160,50,1}, {170,55,0}};
        int  j = 145;
        for (uint k1 = 0; k1 <= items.size(); ++k1) {
            for (uint k2 = 0; k2 <= items.size(); ++k2) {
                auto p = knapSack(j,std::vector<uint>({k1,k2}),items);
                if (p.second)
                    std::cout << "K0 = " << k1 << ", K1 = " << k2 << ": " << p.first << std::endl;
            }
        }
        return 0;
    }
    
    % OUTPUT (with comments) %
    
    K0 = 0, K1 = 0: 0
    K0 = 0, K1 = 1: 180 // e.g. {} from 0, {180} from 1
    K0 = 0, K1 = 2: 340 // e.g. {} from 0, {160,180} from 1
    K0 = 0, K1 = 3: 480 // e.g. {} from 0, {140,160,180} from 1
    K0 = 1, K1 = 0: 170 // e.g. {170} from 0, {} from 1
    K0 = 1, K1 = 1: 350 // e.g. {170} from 0, {180} from 1
    K0 = 1, K1 = 2: 490 // e.g. {170} from 0, {140, 180} from 1
    K0 = 1, K1 = 3: 565 // e.g. {145} from 0, {100, 140, 180} from 1
    K0 = 2, K1 = 0: 315 // e.g. {145,170} from 0, {} from 1
    K0 = 2, K1 = 1: 495 // e.g. {145,170} from 0, {180} from 1
    K0 = 2, K1 = 2: 550 // e.g. {60,170} from 0, {140,180} from 1
    K0 = 2, K1 = 3: 600 // e.g. {60,120} from 0, {100,140,180} from 1
    K0 = 3, K1 = 0: 435 // e.g. {120,145,170} from 0, {} from 1
    K0 = 3, K1 = 1: 535 // e.g. {120,145,170} from 0, {100} from 1
    K0 = 3, K1 = 2: 605 // e.g. {60,120,145} from 0, {100,180} from 1
    K0 = 4, K1 = 0: 495 // e.g. {60,120,145,170} from 0, {} from 1
    

    For a given set of items with two categories, the output appears to be correct, though my manual check might have failed to spot some problems [an earlier version of this answer did indeed have some bugs]. All the cases not printed are those with no solution.

    Returning the set of selected objects

    If you want the function to return the set of selected objects, this is no obstacle in principle -- the code just gets messier. The most human-understandable thing to do is simply add an std::set<std::size_t> to the tuple of objects returned by recursion and by knapSack, and stored in the cache, representing the collection of indices of the chosen objects. Each time a new object is added, this set can be augmented. The resulting code involves lots of copying of sets of integers, and is probably far from optimal -- a better solution might involve a static boolean vector whose entries are toggled on and off. However, it works and makes sense, so here it is:

    #include <cstdio>
    #include <iostream>
    #include <vector>
    #include <map>
    #include <set>
    #include <algorithm>
    
    using uint = unsigned int;
    
    template <typename T>
    struct item {
        T value;
        uint weight;
        uint category;
    };
    
    template <typename T>
    std::tuple<T,bool,std::set<size_t> > knapSack(uint W, std::vector<uint> K, const std::vector< item<T> >& items) {
    
        std::map< std::tuple<uint, uint, std::vector<uint> >, std::tuple<T,bool,std::set<std::size_t> > > cache;
        
        std::function<std::tuple<T,bool,std::set<std::size_t> >(uint, uint, std::vector<uint>&)> recursion;
    
        recursion = [&] (uint n, uint w, std::vector<uint>& k) {
            
            auto it = cache.find(std::make_tuple(n,w,k));
            if (it != cache.end())
                return it->second;
            
            std::vector<uint> ccount(K.size(),0);
            for (uint i = 0; i < n; ++i) {
                ++ccount[items[i].category];
            }
            
            for (uint c = 0; c < k.size(); ++c) {
                if (k[c] > ccount[c]) {
                    auto p = std::make_tuple(0,false,std::set<std::size_t>{});
                    cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
                    return p;
                }
            }
            uint sumk = 0; for (const auto& _k : k) sumk += _k;
            if (n == 0 || sumk == 0) {
                auto p = std::make_tuple(0,true,std::set<std::size_t>{});
                cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
                return p;
            }
    
            T _v = items[n-1].value;
            uint _w = items[n-1].weight;
            uint _c = items[n-1].category;
            T nextv;
            bool nextvalid = true;
            std::set<std::size_t> nextset;
            if (_w <= w and k[_c] > 0) {
                --k[_c];
                auto take = recursion(n-1,w-_w,k);
                ++k[_c];
                auto reject = recursion(n-1,w,k);
                T a = _v + std::get<0>(take);
                T b = std::get<0>(reject);
                if (std::get<1>(take) and std::get<1>(reject)) {
                    nextv = std::max(a,b);
                    if (a > b) {
                        nextset = std::get<2>(take);
                        nextset.insert(n-1);
                    } else {
                        nextset = std::get<2>(reject);
                    }
                } else if (std::get<1>(take)) {
                    nextv = a;
                    nextset = std::get<2>(take);
                    nextset.insert(n-1);
                } else if (std::get<1>(reject)) {
                    nextv = b;
                    nextset = std::get<2>(reject);
                } else {
                    nextv = 0;
                    nextvalid = false;
                    nextset = {};
                }   
            } else {
                std::tie(nextv,nextvalid,nextset) = recursion(n-1,w,k);
            }
            auto p = std::make_tuple(nextv,nextvalid,nextset);
            cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
            return p;   
        };
    
        return recursion(items.size(),W,K);
    }
     
    int main(int argc, char *argv[]) {
        std::vector< item<int> > items = {{60,10,0}, {100,20,1}, {120,30,0}, {140,35,1}, {145,40,0}, {180,45,1}, {160,50,1}, {170,55,0}};
        int  j = 145;
        for (uint k1 = 0; k1 <= items.size(); ++k1) {
            for (uint k2 = 0; k2 <= items.size(); ++k2) {
                auto p = knapSack(j,std::vector<uint>({k1,k2}),items);
                if (std::get<1>(p)) {
                    std::cout << "K0 = " << k1 << ", K1 = " << k2 << ": " << std::get<0>(p);
                    std::cout << "; contents are {";
                    for (const auto& index : std::get<2>(p))
                        std::cout << index << ", ";
                    std::cout << "}" << std::endl;
                }
            }
        }
        return 0;
    }
    

    The output of this is

    K0 = 0, K1 = 0: 0; contents are {}
    K0 = 0, K1 = 1: 180; contents are {5, }
    K0 = 0, K1 = 2: 340; contents are {5, 6, }
    K0 = 0, K1 = 3: 480; contents are {3, 5, 6, }
    K0 = 1, K1 = 0: 170; contents are {7, }
    K0 = 1, K1 = 1: 350; contents are {5, 7, }
    K0 = 1, K1 = 2: 490; contents are {3, 5, 7, }
    K0 = 1, K1 = 3: 565; contents are {1, 3, 4, 5, }
    K0 = 2, K1 = 0: 315; contents are {4, 7, }
    K0 = 2, K1 = 1: 495; contents are {4, 5, 7, }
    K0 = 2, K1 = 2: 550; contents are {0, 3, 5, 7, }
    K0 = 2, K1 = 3: 600; contents are {0, 1, 2, 3, 5, }
    K0 = 3, K1 = 0: 435; contents are {2, 4, 7, }
    K0 = 3, K1 = 1: 535; contents are {1, 2, 4, 7, }
    K0 = 3, K1 = 2: 605; contents are {0, 1, 2, 4, 5, }
    K0 = 4, K1 = 0: 495; contents are {0, 2, 4, 7, }
    

    Algorithmic complexity

    This is not my forte, but I believe the runtime complexity is psuedo-polynomial, since the algorithm is very similar to the standard knapsack algorithm.