What exactly does adding more bins into np.histogram(data, bins=100)
do? I know that it divides the data into the amount of bins you specify but what exactly does that entail? For example, I have a histogram and I plotted a best fit line to the histogram using scipy.curve_fit
and when I increased the bins, it also increased the accuracy for my best fit line.
The following function illustrates the difference using matplotlib. The same data is plotted using 5 bins and 10 bins:
import matplotlib.pyplot as plt
def plot_histogram(num_bins):
x = [1, 1, 2, 3, 3, 5, 7, 8, 9, 10,
10, 11, 11, 13, 13, 15, 16, 17, 18, 18,
18, 19, 20, 21, 21, 23, 24, 24, 25, 25,
25, 25, 26, 26, 26, 27, 27, 27, 27, 27,
29, 30, 30, 31, 33, 34, 34, 34, 35, 36,
36, 37, 37, 38, 38, 39, 40, 41, 41, 42,
43, 44, 45, 45, 46, 47, 48, 48, 49, 50,
51, 52, 53, 54, 55, 55, 56, 57, 58, 60,
61, 63, 64, 65, 66, 68, 70, 71, 72, 74,
75, 77, 81, 83, 84, 87, 89, 90, 90, 91
]
plt.hist(x, bins=num_bins)
plt.title(f'{num_bins} bins')
plt.show()
plot_histogram(5)
plot_histogram(10)
Above, there are 30 data points that have a value between 20 and 40.
Above, you have more detail. There are 19 data points between 20 and 30 and 11 data points between 30 and 40.