Search code examples
pythonperformancemathoptimizationcombinations

Find all possible sums of the combinations of sets of integers, efficiently


I have an algorithm that finds the set of all unique sums of the combinations of k tuples drawn with replacement from of a list of tuples. Each tuple contains n positive integers, the order of these integers matters, and the sum of the tuples is defined as element-wise addition. e.g. (1, 2, 3) + (4, 5, 6) = (5, 7, 9)

Simple example for k=2 and n=3:

input = [(1,0,0), (2,1,1), (3,3,2)]  
solution = [(1,0,0)+(2,1,1), (1,0,0)+(3,3,2), (2,1,1)+(3,3,2), (1,0,0)+(1,0,0), (2,1,1)+(2,1,1), (3,3,2)+(3,3,2)]  
solution = [(3,1,1), (4,3,2), (5,4,3), (2,0,0), (4,2,2), (6,6,4)]

In practice the integers in the tuples range from 0 to 50 (in some positions it may be a lot more constraint, like [0:2]), k goes up to 4 combinations, and the length of the tuples goes up to 5. The number of tuples to draw from goes up to a thousand.

The algorithm I currently have is an adaptation of an algorithm proposed in a related question, it's more efficient then enumerating all combinations with itertools (if we're drawing 4 tuples out of 1000, there are billions of combinations, but the number of sums will be orders of magnitude less), but I don't see how to apply bitsets for example to this problem.

# example where length of tuples n = 3:
lst = []
for x in range(0,50,2):
    for y in range(0, 20, 1):
        for z in range(0, 3, 1):
            lst.append((x,y,z))

# this function works for any k and n
def unique_combination_sums(lst, k):
    n = len(lst[0])
    sums = {tuple(0 for _ in range(n))}  # initialize with tuple of zeros
    for _ in range(k):
        sums = {tuple(s[i]+x[i] for i in range(n)) for s in sums for x in lst}
    return sums

unique_combination_sums(lst, 4)

Solution

  • You can actually encode your tuples as integers. Since you mention that the integers range [0, 50], and there may be up to 5 such integers, so that creates a range of 51^5 = 345,025,251 values, which is perfectly doable.

    To understand how we can do this encoding, think about how decimal numbers work- 123 means 1*100 + 2*10 + 1*1. Each digit is multiplied by the base (10) raised to some power, corresponding to its position. Each number has only one representation, because each digit is less than the base (10) itself. We could do something similar then; we could choose a sufficiently large base, say 100, and multiply each value in the tuple by the base to its corresponding power. Take the following example:

    (1, 4, 7)
    -> 1*100^2 + 4*100^1 + 7*100^0
    -> 1*10000 + 4*100   + 7
    -> 10407
    

    This work work perfectly well by itself, however whatever underlying solver you're using for the integer case may very well perform better on smaller numbers, so we really should try to "compact" the representation as much as possible. This means picking the smallest base possible. In fact, it means picking multiple bases, for a mixed-radix number system. Without going into too much detail, it means that if one position of the tuples only spans a small interval of integers, we won't "waste" space for values that won't ever exist at that specific tuple position. What this may look like, for an arbitrary example:

    (1, 4, 7, 11)
    -> 1*22*7*15 + 4*22*7 + 7*22 + 11*1
    -> 2310      + 616    + 154  + 11
    -> 3091
    // Here we arbitrarily choose the radices [22, 7, 15]
    // In practice, we actually choose meaningful (and minimal) radices
    

    Furthermore, we can also subtract off the smallest value at tuple position, to shrink the values even further. We just have to remember to add back the appropriate offset multiplied by the number of elements when we convert the value back to a tuple.

    All that said, here's the code to do exactly that:

    from functools import wraps
    
    
    def transform_tuples(func):
        @wraps(func)
        def inner(arr, n):
            if n == 0 or not arr:
                return set()
            
            groups = [(max(g)-min(g), min(g)) for g in zip(*arr)]
            
            def encode(tup):
                val = 0
                for (size, low), elem in zip(groups, tup):
                    val *= size * n + 1
                    val += elem - low
                return val
                
            def decode(val):
                tup = []
                for size, low in groups[::-1]:
                    val, part = divmod(val, size * n + 1)
                    tup.append(part + low * n)
                return tuple(tup[::-1])
                
            result = func([encode(tup) for tup in arr], n)
            return [decode(val) for val in result]
        
        return inner
    

    This is a decorator- you apply it to function the solves the original integer-based problem, and it will transform the function into one that operates on tuples.

    For example, taking the Kelly1 solution from the related question you linked above, we can decorate it, and it will then work on tuples:

    @transform_tuples
    def Kelly1(a, n):
        sums = {0}
        for _ in range(n):
            sums = {s + x for s in sums for x in a}
        return sums
    

    Calling it on your example:

    tuples = [(1,0,0), (2,1,1), (3,3,2)]
    k = 2
    
    print(Kelly1(tuples, k))
    

    Produces:

    [(2, 0, 0), (5, 4, 3), (3, 1, 1), (6, 6, 4), (4, 2, 2), (4, 3, 2)]
    

    So you can take whichever implementation is the fastest, tweak it / optimize it as you like, and then decorate it to operate on tuples.