Search code examples
pythonperformancenumpyprobabilitycdf

How to compute 2D cumulative sum efficiently


Given a two-dimensional numerical array X of shape (m,n), I would like to compute an array Y of the same shape, where Y[i,j] is the cumulative sum of X[i_,j_] for 0<=i_<=i, 0<=j_<=j. If X describes a 2D probability distribution, Y could be thought of as the 2D cumulative distribution function (CDF).

I can obviously compute all entries of Y in a double for loop. However, there is a recursive aspect to this computation, as Y[i,j] = X[i,j] + Y[i-1,j] + Y[i,j-1] - Y[i-1,j-1] (where negative indexing means 0).

I was looking for "2d Python cumsum", and I've found that NumPy's cumsum merely flattens the array.

My Questions:

  1. Is there a standard Python function for computing Y efficiently?
  2. If not, is the recursive idea above optimal?

Thanks.


Solution

  • A kernel splitting method can be applied here to solve this problem very efficiently with only two np.cumsum: one vertical and one horizontal (or the other way since this is symmetric).

    Here is an example:

    x = np.random.randint(0, 10, (4, 5))
    print(x)
    y = np.cumsum(np.cumsum(x, axis=0), axis=1)
    print(y)
    

    Here is the result:

    [[1 9 8 1 7]
     [0 6 8 2 3]
     [1 3 6 4 4]
     [0 8 1 2 9]]
    
    [[ 1 10 18 19 26]
     [ 1 16 32 35 45]
     [ 2 20 42 49 63]
     [ 2 28 51 60 83]]