Search code examples
rnormal-distribution

R - Normal distribution with top 10% larger than specific value


I need to simulate a roughly normally distributed sample in R of size 500. 10%, i.e., 50 of the values should be larger than 50, the rest should be below 50 but still larger than 0.

I'm kind of stuck... any help is highly appreciated!


Solution

  • There is a way to do this. First of all, take 502 samples:

    x <- rnorm(502)
    

    Now normalize the output so that the minimum is 0 and maximum is 1:

    x <- x - min(x)
    x <- x / max(x)
    

    Since the 0 and 1 are not random, we remove them from the sample:

    x <- x[-c(which.min(x), which.max(x))]
    

    Now we multiply the result by whichever number causes 10% of the sample to be greater than 50. We can use optimize for this:

    f <- function(a) abs(sum((a * x) > 50)/length(x) - 0.1)
    x <- optimize(f, c(0, 100))$minimum * x
    

    This gives us what appears to be a normally distributed sample:

    hist(x)
    

    And exactly 10% of the samples are above 50:

    length(x)
    #> [1] 500
    
    sum(x > 50)
    #> [1] 50
    

    None of the samples are less than 0:

    min(x)
    #> [1] 6.299734
    

    And the result is not statistically different from a normal distribution:

    shapiro.test(x)
    #> 
    #>  Shapiro-Wilk normality test
    #> 
    #> data:  x
    #> W = 0.99769, p-value = 0.7275
    

    Addendum

    Incidentally, if you only need a single sample, a quick alternative is:

    set.seed(4)
    x <- rnorm(500, 35.3, 12)
    

    Here, x is normally distributed with a minimum value of 1.22 and exactly 50 elements are over 50.