Search code examples
pythonnumpyscipy

scipy.stats.normaltest() to test the normality of numpy.random.normal()


I used scipy.stats.normaltest() to test the normality of the data generated by numpy.random.normal(). Here is the code:

from numpy import random
from scipy import stats

for i in range(0, 10):
  d = numpy.random.normal(size=50000)
  n = scipy.stats.normaltest(d)
  print n

Here are the results:

(1.554124262066523, 0.45975472830684272)
(2.4982341884494002, 0.28675786530134384)
(2.0918010143075256, 0.35137526093176125)
(0.90623072927961634, 0.63564479846313271)
(2.3015160217986934, 0.31639684620041014)
(3.4005006481463624, 0.18263779969208352)
(2.5241123233368978, 0.28307138716898311)
(12.705060069198185, 0.001742333391388526)
(0.83646951793409796, 0.65820769012847313)
(0.12008522338293379, 0.94172440425950443)

According to the document here, the second element of the value returned by normaltest() is

pvalue : float or array
  A 2-sided chi squared probability for the hypothesis test.

If my understanding is correct, it indicates how likely the input data is in normal distribution. I had expected that all the pvalues generated by the above code very close to 1. However, some of them can be as small as 0.001742333391388526. What's wrong here?


Solution

  • If my understanding is correct, it indicates how likely the input data is in normal distribution. I had expected that all the pvalues generated by the above code very close to 1.

    Your understanding is incorrect, I'm afraid. The p-value is the probability to get a result that is at least as extreme as the observation under the null hypothesis (i.e. under the assumption that the data is actually normal distributed). It does not need to be close to 1. Usually, p-values greater than 0.05 are considered not significant, which means that normality has not been disproved by the test.

    As pointed out by Victor Chubukov, you can get low p-values simply by chance, even if the data is really normally distributed.

    Statistical hypothesis testing is rather complex and can appear somewhat counter intuitive. If you need to know more details, Cross Validated is the place to get more detailed answers.