Search code examples
pythonhistogramdata-analysis

Is there any way to take the data that falls outside of a historgram bin into the outermost bins?


I am not very good at programming and I am doing data analysis for a lab. I am trying to generate a histogram on Python with a set number of bins, each with the same bin widths, which theoretically ought to capture all the data. But there tends to be around 10 data points which are not caught within the histogram. Is there any way to shove those data points into the outmost bins depending if they are smaller or larger than the overall width of the histogram?

I apologize if this question is too basic, or was answered elsewhere, but I probably lack the vocabulary to find an answer myself.

Here is the code in question. I think it uses a special toolkit made by my university, but I ope it is nonetheless intelligible:

#sort out N by histogram
binwidth = 2.5*sig_Q
low_center = min(Q_exp) + binwidth
def set_range(first_bin = low_center, bin_width = binwidth, Nbins = 10):
     """
     helper function to set the range and bin width
     input : first_bin =  bin_center of the first bin, bin_width = the bin width, Nbins = total number of bins
     returns: a tuple that you can use in the range key word when defining a histogram.
     NOTE: for the histogram use the same number of bins:

 example:  h = histo( r, range = set_range(-5., 1, 11), bins = 11)

 this created a histogram where the first bin is centered at -5. , the next at -4. etc. a total of 11 bins are
 created and the bin center of the last one is at 5. = first_bin + (Nbins-1)*bin_width
 """
 rmin = first_bin - bin_width/2.
 rmax = rmin + Nbins*bin_width
 return (rmin,rmax)


h = B.histo(Q_exp, range = set_range(low_center, bin_width = binwidth, Nbins = 10), bins = 10)
hx = h.bin_center
hy = h.bin_content
B.pl.ylabel("Counts", fontsize = 20)
B.pl.xlabel("Gaussian Deviates", fontsize = 20)
B.pl.title("Monte Carlo Millikan Oil-Drop Simulation", fontsize = 22)
h.plot()
B.pl.show()

Best,


Solution

  • I finally figured it out. My bin widths were just too small. Once I lengthened them and made the "low_center" smaller, my data came out good.