I used this dataset:
lst = [81922.00557103065, 82887.70053475935, 80413.01627033792,
81708.86075949368, 82997.38219895288, 84641.50943396226,
81929.82456140351, 82632.24181360201, 77667.98418972333,
73726.47427854454, 86113.2075471698, 83232.98429319372,
79866.66666666667, 83833.74689826302, 81943.06930693069,
77898.64029666255, 77401.47783251232, 80607.59493670886,
78384.5126835781, 82608.69565217392, 80824.8730964467,
84163.70106761566, 74887.38738738738
]
Then statistics.stdev(lst)
is 3096.28 and numpy.std(lst)
is 3028.23. The difference is about 2.2%.
They are calculating two slightly different things.
The standard deviation is the square root of the variance. NumPy is using the sample variance, whereas statistics
is adjusting this with Bessel's correction. This uses N – 1 instead of N in the calculation of the variance:
arr = np.array(lst)
var_ordinary = np.sum(abs(arr - arr.mean())**2) / arr.size
var_bessel = np.sum(np.abs(arr - arr.mean())**2) / (arr.size - 1)
From the statistics
docs:
This is the sample variance s² with Bessel’s correction, also known as variance with N-1 degrees of freedom. Provided that the data points are representative (e.g. independent and identically distributed), the result should be an unbiased estimate of the true population variance.