Search code examples
rnormal-distribution

How to simplify code in R (normality test): different sample sizes in 1 line or 2 lines of code?


I want to conduct normality tests a little bit cleaner in my coding and do a simulation (repeat the test 1000 times).

sample <- c(10,30,50,100,500)
shapiro.test(rnorm(sample))

    Shapiro-Wilk normality test

data:  rnorm(sample)
W = 0.90644, p-value = 0.4465

This only gives one output as you can observe above. How do I get 5 outputs? Is there something I am missing here..?

Using the replicate function gives me 1000 statistics per sample size, while I am only interested in the p-values and relate them to a significance level. In the coding of the individual normality tests, I used the following code (thanks to user StupidWolf, in my previous posted questions on stackoverflow)

replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
table(replicate_sw10["p.value",]<0.10)/1000
#which gave the following output
> FALSE  TRUE 
> 0.896 0.104

Solution

  • You may simply use $p.value. The code below yields a matrix with 1,000 rows for the repetitions, and 5 columns for the smpl sizes. If you want a list as result, just use lapply instead of sapply.

    smpl <- c(10, 30, 50, 100, 500)
    
    set.seed(42)  ## for sake of reproducibility
    
    res <- sapply(smpl, function(x) replicate(1e3, shapiro.test(rnorm(x))$p.value))
    head(res)
    #            [,1]      [,2]       [,3]      [,4]      [,5]
    # [1,] 0.43524553 0.5624891 0.02116901 0.8972087 0.8010757
    # [2,] 0.67500688 0.1417968 0.03722656 0.7614192 0.7559309
    # [3,] 0.52777713 0.6728819 0.67880178 0.1455375 0.7734797
    # [4,] 0.55618980 0.1736095 0.69879316 0.4950400 0.5181642
    # [5,] 0.93774782 0.9077292 0.58930787 0.2687687 0.8435223
    # [6,] 0.01444456 0.1214157 0.07042380 0.4479121 0.7982574