I have the data set below (Data) and I create a histogram using the code below to extract n (number of points in each bin or frequency). Then I calculate the probability of each of the bins by dividing frequency by total number of points to get the respective probability of each bin (bin_probability).
Now I want to get the probability for each point in a list. For example say point 1 is in bin 1 therefore, probability is the first value in the array of 0.65; point 2 is in bin 5 so probability is 0.05, etc. How do I map each point to its respective bin_probability so that I have a list of probabilities for each point (in this case 20 probabilities)?
Data = [4.33, 4.11, 6.33, 5.67, 3.24, 6.74, 24.6, 6.43, 4.122, 9.67, 9.99, 3.44, 5.66, 3.54, 5.34, 6.55, 5.78, 3.56, 1.55, 5.45]
n, bin_edges = np.histogram(Data, bins = 10)
totalcount = np.sum(n)
bin_probability = n / totalcount
print(bin_probability)
>> array([0.65, 0.3 , 0. , 0. , 0.05])
Many thanks for your help!
Based on @kcsquared's link above, a list can be made with the respective bin locations for each point. The variable 'bins_per_point' includes 20 elements in an array. Each element corresponds to bin the data point is part of. Next the 'probability_perpoint variable divides each frequency by the total count to get the respective probabilities.
bins_per_point = np.fmin(np.digitize(Data, bin_edges), len(bin_edges)-1)
probability_perpoint = [bin_probability[bins_per_point[i]-1] for i in range(len(Data))]
>> array([0.1 , 0.1 , 0.15, 0.1 , 0.05, 0.15, 0.55, 0.15, 0.1 , 0.2 , 0.2 ,
0.05, 0.1 , 0.05, 0.1 , 0.15, 0.1 , 0.05, 0.05, 0.1 ])
To verify, the sum of unique probabilities is 1.
np.sum(bin_probability)
>> 1