Search code examples
rmean

Mean comparison using Johnson distribution


I have to use Johnson distribution to see how different means are when modifying the skewness value (.3 in the code below):

library(moments)
library(SuppDists)

k <- 500

parms <-JohnsonFit(c(0, 1, .3, 6))
sJohnson(parms)
poblacion <- rJohnson(1000, parms)

mu.pob <- mean(poblacion)
sd.pob <- sd(poblacion)

p <- vector(length=k)
for (i in p){
  muestra <- poblacion[rJohnson(1000, parms)]
  p[i] <- t.test(muestra, mu = mu.pob)$p.value
}

a_teo = 0.05
a_emp = length(p[p<a_teo])/k
sprintf("alpha_teo = %.3f <-> alpha_emp = %.3f", a_teo, a_emp)

If I change the .3 for 1, I got different values for mean and standard deviation, but I got exactly the same empirical value for alpha: 1.000. What is wrong with my code?


Solution

  • There are a few errors in your code.

    • Instead of for(i in p) it should be for(i in seq.int(k));
    • The way you sample/resample, poblacion[rJohnson(1000, parms)], is wrong. The index rJohnson is not integer and it will not index ppoblacion. The correct ways should be any of the ways in the for loop below.
    • Though not wrong your calculation of a_emp is too complicated, see the equivalent mean(p < a_teo).

    And here is the complete code corrected.

    library(moments)
    library(SuppDists)
    
    set.seed(2024)
    
    k <- 500
    
    parms <-JohnsonFit(c(0, 1, .3, 6))
    # sJohnson(parms)
    poblacion <- rJohnson(1000, parms)
    
    mu.pob <- mean(poblacion)
    sd.pob <- sd(poblacion)
    
    p <- numeric(k)
    for(i in seq.int(k)){
      # muestra <- sample(poblacion, 1000, TRUE)
      muestra <- rJohnson(1000, parms)
      p[i] <- t.test(muestra, mu = mu.pob)$p.value
    }
    
    a_teo <- 0.05
    a_emp <- length(p[p<a_teo])/k
    mean(p < a_teo)
    #> [1] 0.042
    sprintf("alpha_teo = %.3f <-> alpha_emp = %.3f", a_teo, a_emp)
    #> [1] "alpha_teo = 0.050 <-> alpha_emp = 0.042"
    

    Created on 2024-03-10 with reprex v2.1.0


    Edit

    Instead of the for loop above, you can use replicate. From help("replicate"), my emphasis:

    replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).

    p <- replicate(k, {
      muestra <- rJohnson(1000, parms)
      t.test(muestra, mu = mu.pob)$p.value
    })
    

    Note that with replicate you do not need to create p with the required length beforehand, it will be replicate's return value.