Search code examples
rrandomdistribution

Sample from a skewed distribution in R


I want to sample numbers from a skewed distribution in R. Let's say I want to sample numbers from a distribution with mode 10 and 95% of values between 5 and 20.

Is there a function in R, similar to rnorm() or runif() that can generate random numbers from such a distribution?


Solution

  • Choosing a log normal distribution with μ = 2.415195 and σ = 0.3355733 will give a distrubution with (approximately) your requirements.

    mu <- 2.415195
    sigma <- 0.3355733
    
    N <- 10000000
    nums <- rlnorm(N, mu, sigma)
    

    Approximately 95% of values are between 5 and 20.

    sum(5 < nums & nums < 20) / N
    #> [1] 0.9500141
    

    Mode is 10

    ggplot(tibble(x = nums), aes(x)) +
      geom_density() +
      geom_vline(xintercept = 10, color = "red") +
      geom_vline(xintercept = c(5, 20), color = "blue")
    

    distribution


    I got these parameters using optimize.

    From any σ, we can calculate what μ gives us a mode of 10 because the mode is: mode

    So we want to find what σ gets us closest to 95% of values between 5 and 20. This can be calculated from the difference between cdf(20) and cdf(5). The CDF for the log normal distribution is: cdf. (plnorm in R).

    f <- function(sigma) {
      mu <- log(10) + sigma^2
      
      abs(plnorm(20, mu, sigma) - plnorm(5, mu, sigma) - 0.95)
    }
    
    optimize(f, lower = 0, upper = 1)
    #> $minimum
    #> [1] 0.3355733
    #> 
    #> $objective
    #> [1] 1.160349e-05