Search code examples
pythonnumpyhistogrambinning

Sorting 2d array into bins and add weights in each bin


Suppose I have a series of 2d coordinates (x, y), each corresponding to a weight. After I arrange them into bins (i.e. a little square area), I want to find the sum of the weights that fall into each bin. I used np.digitize to find which bins my data falls into, then I added weights in each bin using a loop. My code is like this:

import numpy as np

x = np.random.uniform(low=0.0, high=10.0, size=5000) #x variable
y = np.random.uniform(low=0.0, high=10.0, size=5000) #y variable
w = np.random.uniform(low=0.0, high=10.0, size=5000) #weight at each (x,y)

binx = np.arange(0, 10, 1)
biny = np.arange(0, 10, 1)

indx = np.digitize(x, binx)
indy = np.digitize(y, biny)

#initialise empty list
weight = [[0] * len(binx) for _ in range(len(biny))]

for n in range(0, len(x)):
    for i in range(0, len(binx)):
        for j in range(0, len(biny)):
            if indx[n] == binx[i] and indy[n] == biny[j]:
                weight[i][j] =+ w[n]

But the first line of the output weight is empty, which doesn't make sense. Why does this happen? Is there a more efficient way to do what I want?

Edit: I have a good answer below (one I accepted), but I wonder how it works if I have bins as floats?--> See edited answer


Solution

  • You can do this with simple indexing. First get the bin number in each direction. You don't need np.digitize for evenly spaced bins:

    xbin = (x // 1).astype(int)
    ybin = (y // 1).astype(int)
    

    Now make an output grid:

    grid = np.zeros_like(w, shape=(xbin.max() + 1, ybin.max() + 1))
    

    Now the trick to getting the addition done correctly with repeated bins is to do it in unbuffered mode. Ufuncs like np.add have a method at just for this purpose:

    np.add.at(grid, (xbin, ybin), w)
    

    Appendix

    This approach is completely general for any even-sized bins. Let's say you had

    x = np.random.uniform(low=-10.0, high=10.0, size=5000)
    y = np.random.uniform(low=-7.0, high=12.0, size=5000)
    
    xstep = 0.375
    ystep = 0.567
    

    Let's say you wanted to compute your bins starting with x.min() and y.min(). You could use a fixed offset instead, and even apply np.clip to out-of bounds indices, but that will be left as an exercise for the reader.

    xbin = ((x - x.min()) // xstep).astype(int)
    ybin = ((y - y.min()) // ystep).astype(int)
    

    Everything else should be the same as the original simplified case.

    When plotting the histogram, your x- and y-axes would be

    xax = np.linspace(x.min(), x.min() + xstep * xbin.max(), xbin.max() + 1) + 0.5 * xstep
    yax = np.linspace(y.min(), y.min() + ystep * ybin.max(), ybin.max() + 1) + 0.5 * ystep
    

    I avoided using np.arange here to minimize roundoff error.