Search code examples
pythonarraysnumpyhistogrambinning

Add numpy array elements/slices with same bin assignment


I have some array A, and the corresponding elements of the array bins contain each row's bin assignment. I want to construct an array S, such that

S[0, :] = (A[(bins == 0), :]).sum(axis=0)

This is rather easy to do with np.stack and list comprehensions, but it seems overly complicated and not terribly readable. Is there a more general way to sum (or even apply some general function to) slices of arrays with bin assignments? scipy.stats.binned_statistic is along the right lines, but requires that bin assignments and values to compute the functions on are the same shape (since I am using slices, this is not the case).

For example, if

A = np.array([[1., 2., 3., 4.],
              [2., 3., 4., 5.],
              [9., 8., 7., 6.],
              [8., 7., 6., 5.]])

and

bins = np.array([0, 1, 0, 2])

then it should result in

S = np.array([[10., 10., 10., 10.],
              [2.,  3.,  4.,  5. ],
              [8.,  7.,  6.,  5. ]])

Solution

  • Here's an approach with matrix-multiplication using np.dot -

    (bins == np.arange(bins.max()+1)[:,None]).dot(A)
    

    Sample run -

    In [40]: A = np.array([[1., 2., 3., 4.],
        ...:               [2., 3., 4., 5.],
        ...:               [9., 8., 7., 6.],
        ...:               [8., 7., 6., 5.]])
    
    In [41]: bins = np.array([0, 1, 0, 2])
    
    In [42]: (bins == np.arange(bins.max()+1)[:,None]).dot(A)
    Out[42]: 
    array([[ 10.,  10.,  10.,  10.],
           [  2.,   3.,   4.,   5.],
           [  8.,   7.,   6.,   5.]])
    

    Performance boost

    A more efficient way to create the mask (bins == np.arange(bins.max()+1)[:,None]), would be like so -

    mask = np.zeros((bins.max()+1, len(bins)), dtype=bool)
    mask[bins, np.arange(len(bins))] = 1