I want to sample numbers from a skewed distribution in R. Let's say I want to sample numbers from a distribution with mode 10
and 95% of values between 5
and 20
.
Is there a function in R
, similar to rnorm()
or runif()
that can generate random numbers from such a distribution?
Choosing a log normal distribution with μ = 2.415195
and σ = 0.3355733
will give a distrubution with (approximately) your requirements.
mu <- 2.415195
sigma <- 0.3355733
N <- 10000000
nums <- rlnorm(N, mu, sigma)
Approximately 95%
of values are between 5
and 20
.
sum(5 < nums & nums < 20) / N
#> [1] 0.9500141
Mode is 10
ggplot(tibble(x = nums), aes(x)) +
geom_density() +
geom_vline(xintercept = 10, color = "red") +
geom_vline(xintercept = c(5, 20), color = "blue")
I got these parameters using optimize
.
From any σ
, we can calculate what μ
gives us a mode of 10
because the mode is:
So we want to find what σ
gets us closest to 95% of values between 5
and 20
. This can be calculated from the difference between cdf(20)
and cdf(5)
. The CDF for the log normal distribution is: . (
plnorm
in R).
f <- function(sigma) {
mu <- log(10) + sigma^2
abs(plnorm(20, mu, sigma) - plnorm(5, mu, sigma) - 0.95)
}
optimize(f, lower = 0, upper = 1)
#> $minimum
#> [1] 0.3355733
#>
#> $objective
#> [1] 1.160349e-05