Search code examples
pythonstatisticsp-valuekolmogorov-smirnov

Why did my p-value equals 0 and statistic equals 1 when I use ks test in python?


Thanks to anyone who have a look first.

My codes are :

import numpy as np
from scipy.stats import kstest
data=[31001, 38502, 40842, 40852, 43007, 47228, 48320, 50500, 54545, 57437, 60126, 65556, 71215, 78460, 81299, 96851, 106472, 108398, 118495, 130832, 141678, 155703, 180689, 218032, 222238, 239553, 250895, 274025, 298231, 330228, 330910, 352058, 362993, 369690, 382487, 397270, 414179, 454013, 504993, 518475, 531767, 551032, 782483, 913658, 1432195, 1712510, 2726323, 2777535, 3996759, 13608152]
x=np.array(data)
test_sta=kstest(x, 'norm')
print(test_sta)

The result of kstest is KstestResult(statistic=1.0, pvalue=0.0). Is there anything wrong with the code or the data is just not normal at all?


Solution

  • I've not used this before, but I think you're testing whether your data is standard-normal (i.e. mean=0, variance=1)

    plotting a histogram shows it to be much closer to a log-normal. I'd therefore do:

    x = np.log(data)
    x -= np.mean(x)
    x /= np.std(x)
    kstest(x, 'norm')
    

    which gives me a test statistic of 0.095 and a p-value of 0.75, confirming that we can't reject that it's not log-normal.

    a good way to check this sort of thing is to generate some random data (from a known distribution) and see what the test gives you back. for example:

    kstest(np.random.normal(size=100), 'norm')
    

    gives me p-values near 1, while:

    kstest(np.random.normal(loc=13, size=100), 'norm')
    

    gives me p-values near 0.

    a log-normal distribution just means that it's normally distributed after log transforming. if you really want to test against a normal distribution, you'd just not log transform the data, e.g:

    x = np.array(data, dtype=float)
    x -= np.mean(x)
    x /= np.std(x)
    kstest(x, 'norm')
    

    which gives me a p-value of 7e-7, indicating that we can reliably reject the hypothesis that it's normally distributed.