Search code examples
pythonalgorithmprefix-sum

python - prefix sum algorithm


I am trying to grasp the idea behind the prefix sum concept looking at the example presented in the Prefix Sum Lesson by Codility here (The mushroom picker problem)

My understanding is that the whole concept is based on the simple property where for finding a sum of all elements between two positions A(pos_left, pos_right) of an array A a second array P is used where all elements are consecutively summed and where the searched sum is calculated as
value(P(pos_right + 1)) - value(P(pos_left)).

A 1 2 3 4 5  6
P 0 1 3 6 10 15 21
sum of all elements between A[2] and A[5] = 3+ 4 + 5 = 12
or using the prefix sums"   P[5+1] - P[2] = 15 -3 = 12 

The problem
There is a street with mushroom at every place represented by a non-empty vector. Given the initial position of a picker and its movement range, possible maximum number of mushrooms to collect is looked for.

Looking at the example I don't understand the logic behind the constuction of the loops. Can anybody clarify the mechanics of this algorithm?

Secondly, I found the positoin indexing in this example very confusing and cumbersome. Is it common practise to "shift" the vector with prefix sums with the zero in the begining? (the fact that counting elements in vectors start by defualt from 0 in python causes already some confusion).

The solution

def prefix_sums(A):
  n = len(A)
  P = [0] * (n + 1)
  for k in xrange(1, n + 1):
      P[k] = P[k - 1] + A[k - 1]
  return P


def count_total(P, x, y):
    return P[y + 1] - P[x]

# A mushroom picker is at spot number k on the road and should perform m moves
def mushrooms(A, k, m):
    n = len(A)
    result = 0
    pref = prefix_sums(A)
    for p in xrange(min(m, k) + 1):   # going left
        left_pos = k - p
        right_pos = min(n - 1, max(k, k + m - 2 * p))
        result = max(result, count_total(pref, left_pos, right_pos))
    for p in xrange(min(m + 1, n - k)):
        right_pos = k + p
        left_pos = max(0, min(k, k - (m - 2 * p)))
        result = max(result, count_total(pref, left_pos, right_pos))
    return result   

I have run some example for a small array A= [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] , chose the position k=5 and the range m = 3. I don't understand the logic of creating the ranges to check by the two loops.

I get the following parameters for the loops

(p=, left_pos=, right_pos=)   
loop 1  (0,5,8), (1,4,6),(2,3,5),(3,2,5)
loop 2  (0,2,5), (1,4,6), (2,5,7), (3,5,8)

The rangies vary. Why?

version for debugging

def mushrooms2(A, k, m):
    n = len(A)
    result = 0
    pref = prefix_sums(A)
    l1 =min(m, k) + 1
    print 'loop p in xrange(min(m, k) + 1): %d' % l1
    for p in xrange(min(m, k) + 1):
        print 'p %d' % p
        print 'A= %r' % A
        print 'pref= %r' % pref
        left_pos = k - p
        right_pos = min(n - 1, max(k, k + m - 2 * p))
        result = max(result, count_total(pref, left_pos, right_pos))
        print 'left_pos = k - p= %d' % left_pos
        print 'right_pos= min(n-1,max(k,k+m-2*p))= %d' % right_pos
        print 'max'
        print '(result %d' % result
        print 'count_total(pref, left_pos, right_pos)) %r, %r, %r, %r' % (pref,left_pos, right_pos,count_total(pref, left_pos, right_pos))
        print 'result= %d' % result
        print 'next p'
    l2=min(m + 1, n - k)
    print   'loop xrange(min(m + 1, n - k)): %d' % l2
    for p in xrange(min(m + 1, n - k)):
        print 'p %d' % p
        print 'A= %r' % A
        print 'pref= %r' % pref
        right_pos = k + p
        left_pos = max(0, min(k, k - (m - 2 * p)))
        result = max(result, count_total(pref, left_pos, right_pos))
        print 'right_pos = k + p= %d' % right_pos
        print 'left_pos = max(0, min(k, k - (m - 2 * p)))= %d' % left_pos
        print 'max'
        print '(result %d' % result
        print 'count_total(pref, left_pos, right_pos)) %r, %r, %r, %r' % (pref,left_pos, right_pos,count_total(pref, left_pos, right_pos))
        print 'result= %d' % result
        print 'next p'
    print 'result %d' % result
    return result

Solution

  • You are not alone in considering the loop construction to be counter-intuitive, as I had to spend a few minutes on it as well. Here's what I figured out.

    Now, the solution in the link you provided further details the optimal strategy is walking on path in such a way that one changes directions only once. In that manner, one is able to cover a range with left and right endpoints, which left_pos and right_pos seems to represent.

    As to the particulars of the loops, instead of thinking of the loop in terms of the loop variables(i.e. p) it is easier to figure out what changes through the course of the loop, and how p is used. Otherwise, figuring out what is in those min and max expressions seems a bit too peculiar in the beginning.

    For instance, in the first loop, instead of figuring out what that range represents, try how left_pos is affected by different values p gets. After a bit of thinking, one notices that left_pos changes in a manner complying to the possible left endpoints.

    Specifically, when p == 0, left endpoint is the starting index(i.e. k) and when p is min(m, k), then it is either 0(i.e. if k < m) or (k - m). In the former case, that is as far as the left endpoint can go, as it would get out of the valid range of spots on the road. In the latter case, the number of moves prohibit any solution with a left_pos smaller than (k - m) since it is impossible to go from k to those indices in m moves.

    The assignment made to right_pos in the first loop can be explained similarly. min statement includes (n-1), which is the rightmost legal index that can be reached and it serves to keep the right endpoint in the allowed limits. The inner max statement features k, as it is the least possible value for right_pos. (i.e. due to k being the starting point) It also has an expression (k + m - 2 * p). This expression represents the following process:

    • Go to left for p moves.
    • Change direction, and go to right for p moves to reach the starting point.
    • Go to right with the remaining (m - 2p) moves.

    The second loop is just the reflection of this first loop, and you may explain it simply by adapting my explanation of the first loop.

    As to your second question, I do not think it is common practice to shift the indices for prefix sum arrays. I typically use this method in online programming contests and my implementation of the prefix sum array you use in Python would be as follows.

    def prefix_sums(A):
        n = len(A)
        P = [0] * n
        P[0] = A[0]
        for k in xrange(1, n):
            P[k] = P[k - 1] + A[k]
        return P
    
    def count_total(P, x, y):
        return (P[y] - P[x - 1] if x > 0 else P[y])
    

    The fundamental idea behind the implementation above is that, at P[x], we have the sum A[0] + A[1] + ... + A[x].