python ks-test failed to identify a normal distribution?

I am learning statitistics, and i want to check a data's distribution, to find if it comes from normal distribution.

I find a ks-test can do this. my code list below:

In [1]: from scipy import stats

In [2]: from read_cj import read

In [3]: df = read()
[read] cost 10.066437721252441

In [4]: stats.kstest(df['XH.self_rank(30)'],'norm')
Out[4]: KstestResult(statistic=0.3203690716401366, pvalue=0.0)

this result seems mean my colums XH.self_rank(30) is normal distribution.

but the hist plot shows like:

I dont think it comes from normal distribution.

and i tried more:

In [9]: stats.kstest([1,2,3,4], 'norm')
Out[9]: KstestResult(statistic=0.8413447460685429, pvalue=0.0012672077773713667)

In [10]: stats.kstest([1]*10000, 'norm')
Out[10]: KstestResult(statistic=0.8413447460685429, pvalue=0.0)

as you can see, the [1]*10000 is stilled considered comes from normal distribution, and [1]*10000 has same statistic value with [1, 2, 3,4], but different p-value. this confused me.

i think this kind of hist plot is normal distribution:

did i miss anything? can you help on this?

Solution

The null hypothesis of Kolmogorov-Smirnov test is that the sample comes from a normal distribution. So a p-value near zero rejects normality.

from scipy import stats
import random

print(stats.kstest([1] * 1000, 'norm').pvalue) # 0.0
print(stats.kstest([random.gauss(0, 1) for _ in range(1000)], 'norm').pvalue) # 0.7275173462861986

You can see that the uniform-ish sample leads to a p-value of zero, strongly suggesting this is not normal. On the other hand, the normal sample indeed leads to a large p-value, (correctly) suggesting that the sample is from a normal distribution.

The same applies to your case. All the suspected samples show p-values near zero, indicating that they are not from normal distributions. So stats.kstest is not broken in my opinion.