Search code examples
pythonnumpyscipyp-valueuniform-distribution

Scipy kstest returns different p-values for similar sets of values


In Python 3.6.5 and scipy 1.1.0, when I run a Kolmogorov-Smirnov test to check a uniform distribution, I obtain two opposite results (from a p-value perspective) if I feed the kstest function with a row or column vector:

from scipy import stats
import numpy as np

>>> np.random.seed(seed=123)
>>> stats.kstest(np.random.uniform(low=0, high=1, size=(10000, 1)), 'uniform')

KstestResult(statistic=0.9999321616877249, pvalue=0.0)

>>> np.random.seed(seed=123)
>>> stats.kstest(np.random.uniform(low=0, high=1, size=(1, 10000)), 'uniform')

KstestResult(statistic=0.9999321616877249, pvalue=0.00013567662455016283)

Do you know why this would be the case?


Solution

  • It is mentioned in the docstring of kstest that when the first argument to kstest is an array, it is expected to be a one-dimensional array. In your examples, you are passing two-dimensional arrays (where one of the dimensions is trivial in each case). It turns out that the code in kstest will not do what you expect when the input array is two-dimensional.

    The easy fix is to flatten the array before passing it to kstest. The ravel() method can be used to do that. For example,

    In [50]: np.random.seed(seed=123)
    
    In [51]: x = np.random.uniform(low=0, high=1, size=(10000, 1))
    
    In [52]: stats.kstest(x.ravel(), 'uniform')
    Out[52]: KstestResult(statistic=0.008002577626569918, pvalue=0.5437230826096209)
    
    In [53]: np.random.seed(seed=123)
    
    In [54]: x = np.random.uniform(low=0, high=1, size=(1, 10000))
    
    In [55]: stats.kstest(x.ravel(), 'uniform')
    Out[55]: KstestResult(statistic=0.008002577626569918, pvalue=0.5437230826096209)