Search code examples
pythonnumpyhistogrambinning

Excluding rightmost edge in numpy.histogram


I have a list of numbers a and a list of bins which I shall use to bin the numbers in a using numpy.histogram. the bins are calculated from the mean and standard deviation (std) of a. So the number of bins is B, and the minimum value of the first bin is mean - std, the maximum of the last bin being mean + std. (The text in bold indicates my final goal)

An example goes like the following:

>>> a
array([1, 1, 3, 2, 2, 6])

>>> bins = np.linspace(mean - std, mean + std, B + 1)
array([ 0.79217487,  1.93072496,  3.06927504,  4.20782513]))

>>> numpy.histogram(a, bins = bins)[0]
(array([2, 3, 0], dtype=int32)

However, I want to exclude the rightmost edge of the last bin - i.e. if some value in a exactly equals mean + std, I do not wish to include it in the last bin. The caricature about mean and std is not important, excluding the rightmost edge (aka making it a half-open interval) is. The doc says, unfortunately in this regard:

All but the last (righthand-most) bin is half-open. In other words, if bins is:

[1, 2, 3, 4] then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

Is there a simple solution I can employ? That is, one that does not involve manually fixing edges. That is something I can do, but that's not what I'm looking for. Is there a flag I can pass or a different method I can use?


Solution

  • Here's one (kind of crude?) way to turn the make the last bin half-open instead of closed. What I'm doing is subtracting the smallest possible value from the right side of the right-most bin:

    a = np.array([1, 1, 3, 2, 2, 6])
    B = 3 # (in this example) 
    bins = np.linspace(a.mean() - a.std(), a.mean() + a.std(), B + 1)
    # array([ 0.79217487,  1.93072496,  3.06927504,  4.20782513]))
    bins[-1] -= np.finfo(float).eps # <== this is the crucial line
    np.histogram(a, bins = bins)
    

    If you're using some other type other than float for the values in a, using a different type in the call to finfo. For example:

    np.finfo(float).eps
    np.finfo(np.float128).eps