Search code examples
pythonpython-3.xalgorithmapproximation

Split array basing on chunk weight


I have an array with 2 <= n <= 100 doubles:

A = [a1, a2, ... , an], ai > 0

and an integer 2 <= k <= min(n, 20). I need to split A into k subarrays:

B1 = [a1,     a2, ... , ap]
B2 = [ap+1, ap+2, ... , aq]

             ...

Bk = [aw+1, aw+2, ... , an] 

such that the sum in each B is almost equal (it's hard to give a strict definition what this means - I'm interested in an approximate solution).

Example:

Input: A = [1, 2, 1, 2, 1], k=2
Output: [[1, 2, 1], [2, 1]] or [[1, 2], [1, 2, 1]]

I tried a probabilistic approach:

  • sample from [1, 2, .., n] using A as probability weights

  • cut the sample into quantiles to find a good partition,

but this was not stable enough for production.

tl;dr This question asks about 2-chunk divisions. I need k-chunk division.


Solution

  • Calculate overall sum of array S. Every chunk sum should be near S / K.

    Then walk through array, calculating running sum R. When R+A[i+1] - S/K becomes larger than S/K - R, close current chunk and make R=0. Continue with the next chunk.

    You also can compensate accumulating error (if it occurs), comparing overall sum of M chunks with M * S / K

    Quick-made code for the last approach (not thoroughly checked)

    def chunks(lst, k):
        s = sum(lst)
        sk = s / k
        #sk = max(s / k, max(lst))
        #variant from user2052436 in comments  
        idx = 0
        chunkstart = 0
        r = 0
        res = []
        for m in range(1, k):
            for idx in range(chunkstart, len(lst)):
                km = k -m
                irest = len(lst)-idx
                if((km>=irest) or (2*r+lst[idx]>2*m*sk)) and (idx>chunkstart):
                    res.append(lst[chunkstart:idx])
                    chunkstart = idx
                    break
                r += lst[idx]
        res.append(lst[idx:len(lst)])
        return res
    
    print(chunks([3,1,5,2,8,3,2], 3))
    print(chunks([1,1,1,100], 3))
    
    >>>[[3, 1, 5], [2, 8], [3, 2]]
       [[1, 1], [1], [100]]