Search code examples
pythonnumpybinningdiscretization

Quickly assign an array to n bins with equal length


For example, I have a stream of array with numbers ranging from 0.0 to 10.0 inclusive.

I want to assign the numbers in arr to 5 bins of equal length quickly.

By equal length I mean the bin intervals are [0.0, 2.0), [2.0, 4.0), [4.0, 6.0), [6.0, 8.0), [8.0, 10.0].

The problem is that the last interval is not same as the other intervals.

Test:

import numpy as np
# Things we know and can pre-calculate
n_bins = 5
minimal = 0.0  
maximal = 10.0
reciprocal_bin_length = n_bins / (maximal - minimal)

# Let's say the stream gives 1001 numbers every time.
data = np.arange(1001)/100

norm_data = (data - minimal) * reciprocal_bin_length
norm_data = norm_data.astype(int)
print(norm_data.max())
print(norm_data.min())

Result:

5
0

The bin index should be 0, 1, 2, 3, or 4, but not 5.


Solution

  • A "poor man's solution" could be to calculate the minimum between your array norm_data and nbins-1:

    norm_data = np.minimum(norm_data,nbins-1)
    

    So all 5s (and above) will be converted into 4s. Mind that of course here you will not do a proper range check (120.0 will also end up in bin 4).