Search code examples
cprefix-sum

prefixSum(A.K.A Scan) in C


This is an school assignment, and I'm asked to write a prefix sum method; the method needs to be "in place" and "Each iteration of the algorithm, one or more additions are performed in parallel". I have not yet implement the running in parallel part of it yet but i have both the up_sweep and down_sweep done

My concern is that it doesn't seem to give me the correct output e.g for an array of (1,2,3,4,5,6,7,8) the correct output should be (1, 3, 6, 10, 15, 21, 28, 36) but I'm getting (0, 1, 3, 6, 10, 15, 21, 28) instead. And here is what i have, please help ***Thanks chux to point this out up_sweep is going to be called on the original array, and the result will be put into down_sweep which will output the final outcome.

void up_sweep(int A[], int size) {
    int d;
    for (d = 0; d <= (log2(size) - 1); d++) {
        int by = pow(2, (d + 1));
        int partition_size = (size - 1) % by + 1;
        int k = 0;
        while (k < size - 1) {
            //temp = (int) k + ((int) pow(2, (d + 1)) - 1);
            A[(int) k + ((int) pow(2, (d + 1)) - 1)] += A[(int) k + (int) pow(2, d) - 1];
            k += partition_size;
        }
    }
}

void down_sweep(int A[], int size) {
    A[size -1] = 0;
    int depth;
    for (depth = (log2(size)-1); depth >= 0; depth--) {
        int by = pow(2, depth + 1);
        int partition_size = (size - 1) % by + 1;
        int k = 0;
        while (k < size - 1) {
            int temp = A[k + (int) pow(2, depth) - 1];
            A[k + (int) pow(2, depth) - 1] = A[k + (int) pow(2, (depth + 1)) - 1];
            A[k + (int) pow(2, (depth + 1)) - 1] += temp;
            k += partition_size;
        }
    }
}

Solution

  • If the algorithm to use were not fixed, this would be a simpler "straightforward" way to achieve the goal:

    void prefixSum(int A[], int size) {
        for (int i = size-2; i > 0; --i) {
            for (int j = i+1; j < size-1; ++j) {
                A[j] += A[i];
            }
        }
    }
    

    The inner j loop can easily be executed in parallel.

    However, if you want to stick to the parallel algorithm, this would be my suggestion to implement it:

    void prefixSum(int A[], int size)
    {
        // up-sweep
        for (int stepSize = 1; stepSize < size; stepSize *= 2) {
            for (int i = stepSize - 1; i < size - stepSize; i += 2*stepSize) {
                A[i+stepSize] += A[i];
            }
        }
    
        // down-sweep
        for (int stepSize = size / 4; stepSize > 0; stepSize /= 2) {
            for (int i = 2 * stepSize - 1; i < size - stepSize; i += 2*stepSize) {
                A[i+stepSize] += A[i];
            }
        }           
    }
    

    I used the stride variable directly for controlling the iteration instead of the "level" variable, this saves me from having to use log and pow function calls. You can again easily parallelize the inner i loops, and also split the function into separate up- and down-sweep steps if you prefer it that way.

    Note that this functions assumes that the size of the array is a power of 2. If you want to make it work with arbitrary array lengths, you will have to either pad the array up to the next power of 2 with zeros or apply the call recursively or manually adapt the iteration boundaries.