I've a question about rebinning a list of numbers, with a desired bin-width. It's basically what a frequency histogram does, but I don't want the plot, just the bin number and the number of occurrences for each bin.
So far I've already written some code that does what I want, but it's not very efficient. Given a list a
, in order to rebin it with a bin-width equal to 3, I've written the following:
import os, sys, math
import numpy as np
# list of numbers
a = list(range(3000))
# number of entries
L = int(len(a))
# desired bin width
W = 3
# number of bins with width W
N = int(L/W)
# definition of new empty array
a_rebin = np.zeros((N, 2))
# cycles to populate the new rebinned array
for n in range(0,N):
k = 0
for i in range(0,L):
if a[i] >= (W*n) and a[i] < (W+W*n):
k = k+1
a_rebin[n]=[W*n,k]
# print
print a_rebin
Now, this does exactly what I want, but I think it's not so smart, as it reads the whole list N
times, with N
number of bins. It's fine for small lists. But, as I have to deal with very large lists and rather small bin-widths, this translates into huge values of N
and the whole process takes a very long time (hours...). Do you have any ideas to improve this code? Thank you in advance!
If you use a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
, your solution is:
[[ 0. 3.]
[ 3. 3.]
[ 6. 3.]]
How you interpret this? The intervals are 0..2, 3..5, 6..8? I think you are missing something.
Using numpy.histogram()
hist, bin_edges = numpy.histogram(a, bins=int(len(a)/W))
print(hist)
print(bin_edges)
Output:
[3 3 4]
[ 0. 3. 6. 9.]
We have 4 values in bin_edges: 0, 3, 6 and 9. All but the last (righthand-most) bin is half-open. It means we have 3 intervals [0,3), [3,6) and [6,9] and we have 3, 3 and 4 elements in each bin.
You can define your own bins.
import numpy
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
bins=[0,1,2]
hist, bin_edges = numpy.histogram(a, bins=bins)
print(hist)
print(bin_edges)
Output:
[1 2]
[0 1 2]
Now you have 1 element in [0 ,1) and 2 elements in [1,2].