Search code examples
rstatisticsdistributionnormal-distribution

Critical Value for Shapiro Wilk test


I am trying to get the critical W value for a Shapiro Wilk Test in R.

        Shapiro-Wilk normality test

data:  samplematrix[, 1]
W = 0.69661, p-value = 7.198e-09

With n=50 and alpha=.05, I know that the critical value W=.947, by conducting the critical value table. However, how do I get this critical value, using R?


Solution

  • Computing critical values directly is not easy (see this CrossValidated answer); what I've got here is essentially the same as what's in that answer (although I came up with it independently, and it improves on that answer slightly by using order statistics rather than random samples). The idea is that we can make a sample progressively more non-Normal until it gets exactly the desired p-value (0.05 in this case), then see what W-statistic corresponds to that sample.

    ## compute S-W for a given Gamma shape parameter and sample size
    tmpf <- function(gshape=20,n=50) {
        shapiro.test(qgamma((1:n)/(n+1),scale=1,shape=gshape))
    }
    ## find shape parameter that corresponds to a particular p-value
    find.shape <- function(n,alpha) {
        uniroot(function(x) tmpf(x,n)$p.value-alpha,
                interval=c(0.01,100))$root
    }
    find.W <- function(n,alpha) {
       s <- find.shape(n,alpha)
       tmpf(s,n=n)$statistic
    }
    find.W(50,0.05)
    

    The answer (0.9540175) is not quite the same as the answer you got, because R uses an approximation to the Shapiro-Wilk test. As far as I know, the actual S-W critical value tables stem entirely from Shapiro and Wilk 1965 Biometrika http://www.jstor.org/stable/2333709 p. 605, which says only "Based on fitted Johnson (1949) S_B approximation, see Shapiro and Wilk 1965a for details" - and "Shapiro and Wilk 1965a" refers to an unpublished manuscript! (S&W essentially sampled many Normal deviates, computed the SW statistic, constructed smooth approximations of the SW statistic over a range of values, and took the critical values from this distribution).

    I also tried to do this by brute force, but (see below) if we want to be naive and not do curve-fitting as SW did, we will need much larger samples ...

    find.W.stoch <- function(n=50,alpha=0.05,N=200000,.progress="none") {
       d <- plyr::raply(N,.Call(stats:::C_SWilk,sort(rnorm(n))),
                        .progress=.progress)
       return(quantile(d[1,],1-alpha))
    }
    

    Compare original S&W values (transcribed from the papers) with the R approximation:

      SW1965 <- c(0.767,0.748,0.762,0.788,0.803,0.818,0.829,0.842,
        0.850,0.859,0.866,0.874,0.881,0.887,0.892,0.897,0.901,0.905,
        0.908,0.911,0.914,0.916,0.918,0.920,0.923,0.924,0.926,0.927,
        0.929,0.930,0.931,0.933,0.934,0.935,0.936,0.938,0.939,0.940,
        0.941,0.942,0.943,0.944,0.945,0.945,0.946,0.947,0.947,0.947)
      Rapprox <- sapply(3:50,find.W,alpha=0.05)
      Rapprox.stoch <- sapply(3:50,find.W.stoch,alpha=0.05,.progress="text")
      par(bty="l",las=1)
      matplot(3:50,cbind(SW1965,Rapprox,Rapprox.stoch),col=c(1,2,4),
              type="l",
              xlab="n",ylab=~W[crit])
      legend("bottomright",col=c(1,2,4),lty=1:3,
             c("SW orig","R approx","stoch"))
    

    enter image description here