Search code examples
rsimulatereplicate

Simulate in R the number of samples needed in order to achieve the true standard deviation


enter image description here

i want to recreate in R the figure above that simulates the number of samples needed in order to achieve the true standard deviation. How can I do it in R ?

I suppose that the distribution is t-distribution or normal. So I have to generate numbers from these distributions and each time to increase the size of the sample and plot it in order to recreate this plot as shown in the figure. Any help ?

set.seed(123)

x <- list(v1=rnorm(1,0,12),v2=rnorm(10,0,11),
          v3=rnorm(20,0,10),v4=rnorm(30,0,9),
          v5=rnorm(40,0,8),v6=rnorm(50,0,7),
          v7=rnorm(60,0,6),v8=rnorm(70,0,5),
          v9=rnorm(80,0,4),v10=rnorm(90,0,3),
          v11=rnorm(100,0,2),v12=rnorm(110,0,2))

g = lapply(x,sd)
g
g1 = unlist(g)
plot(g1,type="l")

Solution

  • First, start with a random uniform distribution of suitable size, and select which sample sizes you want to compute your standard error of the mean.

    set.seed(123)
    
    x <- runif(1e6, 0, 1)
    sample_size <- 5:120
    

    You can define a function to compute this sigma_m. Here you sample with replacement a sample of n from x, and take the standard deviation and divide by sqrt(n).

    calc_sigma_m <- function(n, x) {
      sd(sample(x, n, replace = TRUE))/sqrt(n)
    }
    

    A data frame can neatly store the sample sizes and sigma_m values for plotting:

    df <- data.frame(sample_size, 
                     sigma_m = sapply(sample_size, calc_sigma_m, x))
    

    Your initial plot will look like this:

    library(ggplot2)
    
    ggplot(df, aes(sample_size, sigma_m)) +
      geom_line()
    

    not smooth plot of sigma_m vs. sample size

    As expected, this is not smooth especially at smaller sample sizes.

    If you want a smooth curve for demonstration, you repeat the sampling process and sigma_m calculation many times, and take the mean.

    calc_sigma_m_mean <- function(n, x) {
      mean(replicate(1000, sd(sample(x, n, replace = TRUE))/sqrt(n)))
    }        
    
    df <- data.frame(sample_size, sigma_m = sapply(sample_size, calc_sigma_m_mean, x))
    

    Then you will get a smoother curve:

    ggplot(df, aes(sample_size, sigma_m)) +
      geom_line()
    

    smooth curve of sigma_m vs. sample size