I need to simulate a roughly normally distributed sample in R of size 500. 10%, i.e., 50 of the values should be larger than 50, the rest should be below 50 but still larger than 0.
I'm kind of stuck... any help is highly appreciated!
There is a way to do this. First of all, take 502 samples:
x <- rnorm(502)
Now normalize the output so that the minimum is 0 and maximum is 1:
x <- x - min(x)
x <- x / max(x)
Since the 0 and 1 are not random, we remove them from the sample:
x <- x[-c(which.min(x), which.max(x))]
Now we multiply the result by whichever number causes 10% of the sample to be greater than 50. We can use optimize
for this:
f <- function(a) abs(sum((a * x) > 50)/length(x) - 0.1)
x <- optimize(f, c(0, 100))$minimum * x
This gives us what appears to be a normally distributed sample:
hist(x)
And exactly 10% of the samples are above 50:
length(x)
#> [1] 500
sum(x > 50)
#> [1] 50
None of the samples are less than 0:
min(x)
#> [1] 6.299734
And the result is not statistically different from a normal distribution:
shapiro.test(x)
#>
#> Shapiro-Wilk normality test
#>
#> data: x
#> W = 0.99769, p-value = 0.7275
Addendum
Incidentally, if you only need a single sample, a quick alternative is:
set.seed(4)
x <- rnorm(500, 35.3, 12)
Here, x
is normally distributed with a minimum value of 1.22 and exactly 50 elements are over 50.