Search code examples
pythonnumpyhistogram

What exactly does adding more bins into `np.histogram` do?


What exactly does adding more bins into np.histogram(data, bins=100) do? I know that it divides the data into the amount of bins you specify but what exactly does that entail? For example, I have a histogram and I plotted a best fit line to the histogram using scipy.curve_fit and when I increased the bins, it also increased the accuracy for my best fit line.


Solution

  • The following function illustrates the difference using matplotlib. The same data is plotted using 5 bins and 10 bins:

    import matplotlib.pyplot as plt
    def plot_histogram(num_bins):
        x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
             10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
             18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
             25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
             29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
             36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
             43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
             51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
             61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
             75, 77, 81, 83, 84, 87, 89, 90, 90, 91
             ]
    
        plt.hist(x, bins=num_bins)
        plt.title(f'{num_bins} bins')
        plt.show()
    
    plot_histogram(5)
    plot_histogram(10)
    

    enter image description here Above, there are 30 data points that have a value between 20 and 40.

    enter image description here Above, you have more detail. There are 19 data points between 20 and 30 and 11 data points between 30 and 40.