r statistics distribution normal-distribution

Critical Value for Shapiro Wilk test

I am trying to get the critical W value for a Shapiro Wilk Test in R.

        Shapiro-Wilk normality test

data:  samplematrix[, 1]
W = 0.69661, p-value = 7.198e-09

With n=50 and alpha=.05, I know that the critical value W=.947, by conducting the critical value table. However, how do I get this critical value, using R?

Solution

Computing critical values directly is not easy (see this CrossValidated answer); what I've got here is essentially the same as what's in that answer (although I came up with it independently, and it improves on that answer slightly by using order statistics rather than random samples). The idea is that we can make a sample progressively more non-Normal until it gets exactly the desired p-value (0.05 in this case), then see what W-statistic corresponds to that sample.

## compute S-W for a given Gamma shape parameter and sample size
tmpf <- function(gshape=20,n=50) {
    shapiro.test(qgamma((1:n)/(n+1),scale=1,shape=gshape))
}
## find shape parameter that corresponds to a particular p-value
find.shape <- function(n,alpha) {
    uniroot(function(x) tmpf(x,n)$p.value-alpha,
            interval=c(0.01,100))$root
}
find.W <- function(n,alpha) {
   s <- find.shape(n,alpha)
   tmpf(s,n=n)$statistic
}
find.W(50,0.05)

The answer (0.9540175) is not quite the same as the answer you got, because R uses an approximation to the Shapiro-Wilk test. As far as I know, the actual S-W critical value tables stem entirely from Shapiro and Wilk 1965 Biometrika http://www.jstor.org/stable/2333709 p. 605, which says only "Based on fitted Johnson (1949) S_B approximation, see Shapiro and Wilk 1965a for details" - and "Shapiro and Wilk 1965a" refers to an unpublished manuscript! (S&W essentially sampled many Normal deviates, computed the SW statistic, constructed smooth approximations of the SW statistic over a range of values, and took the critical values from this distribution).

I also tried to do this by brute force, but (see below) if we want to be naive and not do curve-fitting as SW did, we will need much larger samples ...

find.W.stoch <- function(n=50,alpha=0.05,N=200000,.progress="none") {
   d <- plyr::raply(N,.Call(stats:::C_SWilk,sort(rnorm(n))),
                    .progress=.progress)
   return(quantile(d[1,],1-alpha))
}

Compare original S&W values (transcribed from the papers) with the R approximation:

  SW1965 <- c(0.767,0.748,0.762,0.788,0.803,0.818,0.829,0.842,
    0.850,0.859,0.866,0.874,0.881,0.887,0.892,0.897,0.901,0.905,
    0.908,0.911,0.914,0.916,0.918,0.920,0.923,0.924,0.926,0.927,
    0.929,0.930,0.931,0.933,0.934,0.935,0.936,0.938,0.939,0.940,
    0.941,0.942,0.943,0.944,0.945,0.945,0.946,0.947,0.947,0.947)
  Rapprox <- sapply(3:50,find.W,alpha=0.05)
  Rapprox.stoch <- sapply(3:50,find.W.stoch,alpha=0.05,.progress="text")
  par(bty="l",las=1)
  matplot(3:50,cbind(SW1965,Rapprox,Rapprox.stoch),col=c(1,2,4),
          type="l",
          xlab="n",ylab=~W[crit])
  legend("bottomright",col=c(1,2,4),lty=1:3,
         c("SW orig","R approx","stoch"))