My goal is to assign each pixel of an image, in my case an numpy array, the frequency of the bin it falls into. For example, I have a values like [0.5, 1, 2, 2, 4]
and bin edges like [0, 1.5, 2.5, 3.5, 4.5]
. So the frequency for the first bin should be 2, for the second 2, for the third 0 and the fourth 1. So in the result should be [2, 2, 2, 2, 1]
.
My plan was to first use numpy.histogram()
to get the frequencies and bin edges and then use numpy.digitize()
with the bin edges to assign the bin index, in which the pixel falls into, to each pixel. Then I want to use these assigned indices to look in hist
for the corresponding frequency. But I encountered the problem that numpy.digitize()
gives me more bins, then there are in hist
and I cannot tell why.
My code looks like this:
First, I have an image (a numpy array) which looks like this:
a_noise = np.random.normal(0, 1, 40000).reshape((200,200))
Next, I am taking the histogram it:
hist, bin_edges = np.histogram(a_noise, bins='fd')
Now I use np.digitize to assign the bin indices to the pixels.
a_binidx = np.digitize(a_noise, bin_edges, right=True)
As a result I get:
hist.shape
(109,)
bins in total, so possible indices range from 0 to 108.
bin_edges.shape
(110,)
bin_edges in total, which makes sense to me.
but when I check what bin indices were given, I get as a result:
np.unique(a_binidx)
array([ 0, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
The highest index is 109. The highest index possible for hist
is 108.
Why do I get 109 instead of 108 indices?
I solved the problem using pandas.cut()
:
a_binidx = pd.cut(a_noise.flatten(), bins=bin_edges, labels=np.arange(hist.shape[0]), include_lowest=True)