Search code examples
pythonbindynamic-rebinding

rebinning a list of numbers in python


I've a question about rebinning a list of numbers, with a desired bin-width. It's basically what a frequency histogram does, but I don't want the plot, just the bin number and the number of occurrences for each bin.

So far I've already written some code that does what I want, but it's not very efficient. Given a list a, in order to rebin it with a bin-width equal to 3, I've written the following:

import os, sys, math
import numpy as np

# list of numbers
a = list(range(3000))

# number of entries
L = int(len(a))

# desired bin width
W = 3

# number of bins with width W
N = int(L/W)

# definition of new empty array
a_rebin = np.zeros((N, 2))

# cycles to populate the new rebinned array
for n in range(0,N):
    k = 0
    for i in range(0,L):
        if a[i] >= (W*n) and a[i] < (W+W*n):
            k = k+1
    a_rebin[n]=[W*n,k]

# print
print a_rebin

Now, this does exactly what I want, but I think it's not so smart, as it reads the whole list N times, with N number of bins. It's fine for small lists. But, as I have to deal with very large lists and rather small bin-widths, this translates into huge values of N and the whole process takes a very long time (hours...). Do you have any ideas to improve this code? Thank you in advance!


Solution

  • If you use a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], your solution is:

    [[ 0. 3.]
    [ 3. 3.]
    [ 6. 3.]]

    How you interpret this? The intervals are 0..2, 3..5, 6..8? I think you are missing something.

    Using numpy.histogram()

    hist, bin_edges = numpy.histogram(a, bins=int(len(a)/W))
    print(hist)
    print(bin_edges)
    

    Output:

    [3 3 4]
    [ 0. 3. 6. 9.]

    We have 4 values in bin_edges: 0, 3, 6 and 9. All but the last (righthand-most) bin is half-open. It means we have 3 intervals [0,3), [3,6) and [6,9] and we have 3, 3 and 4 elements in each bin.
    You can define your own bins.

    import numpy
    a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    bins=[0,1,2]
    hist, bin_edges = numpy.histogram(a, bins=bins)
    print(hist)
    print(bin_edges)
    

    Output:

    [1 2]
    [0 1 2]

    Now you have 1 element in [0 ,1) and 2 elements in [1,2].