Search code examples
pythonplotlyp-valuekolmogorov-smirnov

Interpretation of p-value in normality tests in Python


I am performing normality tests on my data. In general I would expect the data to be approximately normal (normal enough), as supported by a histogram of raw values and QQplot. histogram of data enter image description hereI have performed Kolmogorov-Smirnov and Shapiro-Wilk tests and here is where I get confused. My p-values are nearly 0. Kolmogorov-Smirnov statistic=0.78, p-value=0.0 Shapiro-Wilk statistic = 0.99, p-value=1.2e-05 which would have me believe that I should reject the null hypothesis. I was going to assume that this is due to the fact that my mean and standard deviation are different to 0 and 1 respetively assumed for the KS test, as explained here but then stumbled across the tutorial on normality test in plotly, where for both tests the low p-values apparently support the null hypothesis! plotly tutorial on normality tests Has anything been changed in the way the tests are being performed? Or is it an error on the tutorial's page?


Solution

  • It seems to be an error in the tutorial. As they state (classical definition), the null hypothesis is that there is no significant difference between the reference distribution and the tested one. This hypothesis should be rejected when the p-value is smaller that your threshold (when the test statistic is greater than the critical value). This is also stated in the same tutorial in the link where they give more information about how to accept or reject the null hypothesis.

    Therefore I believe it is an error. In both examples, the null hypothesis of no difference should be rejected, as the p-values seem to be smaller than 0.05 and the test statistics are greater than their respective critical values.