Ok, so Im trying to use scipys implementation of kstest as a way of evaluating which distribution best fits the data. My understanding of how kstest works is that the statistic represents the probability of the null hypothesis (ie the probability returned is the probability that the model in question is wrong for the data). This works about as expected for a uniform distribution betwen 0.0 and 1.0
a = np.random.uniform(size=4999)
print(scipy.stats.kstest(a, 'uniform', args=(0.0,1.0)))
KstestResult(statistic=0.010517039009963702, pvalue=0.63796173656227928)
However, when I shift the uniform distributions bounds from (0.0, 1.0) to (2.0,3.0), the K-S statistic is oddly high
a = np.random.uniform(2.0, 3.0,size=4999)
print(scipy.stats.kstest(a, 'uniform', args=(2.0,3.0)))
KstestResult(statistic=0.66671700832788283, pvalue=0.0)
Shouldnt the value of the test statistic in the second case be low as well, since the parameters passed approximate the distribution as closely as before?
The numpy
(used by you) and scipy.stats
(used by ks test) versions of uniform
work differently:
>>> np.random.uniform(2,3,5000).max()
2.9999333044165271
>>> stats.uniform(2,3).rvs(5000).max()
4.9995316751114043
In numpy
the second parameter is interpreted as the upper bound, in scipy.stats
it is the scale
paramter, i.e. the width.