Search code examples
rdistributionnormal-distributionbinning

generate normal distribution with exactly N elements in Y bins


I'll probably want to hit myself over the head for not getting this:

How do I generate a vector with the expected height of a normal distribution over Y bins (nbins in the below), of exactly N elements.

Like so, in the below picture:

  • Y or nbins = 15
  • N or nstat = 77
  • ... should return something like: c(1,1,2,4, ...)

example q sort

I know I could draw rnorm(77), but that'll never be exactly normal, and looping over 10.000 iterations or so seems overkill.

So I tried using qnorm for that purpose, but I have a hunch that:

  1. sth is wrong with the below code
  2. there has to be an easier, more elegant way

Here is what I got:

nbins <- 15
nstat <- 77

item.pos <- qnorm( # to the left of which value lies...
  1:(nstat) / (nstat+1)# ... the n-statement?
  # using nstat + 1 because we want midpoints, not cutoffs for later
)

bins <- cut(
  x = item.pos,
  breaks = nbins,
  ordered_result = TRUE
)

height <- summary(bins)
height <- as.numeric(bins)

Solution

  • If your range of data is from -2:2 with 15 intervals and the sample size is 77 I would suggest the following to get the expected heights of the 15 intervals:

    rn <- dnorm(seq(-2,2, length = 15))/sum(dnorm(seq(-2,2, length = 15)))*77
     [1] 1.226486 2.084993 3.266586 4.716619 6.276462 7.697443 8.700123 9.062576 8.700123 7.697443
    [11] 6.276462 4.716619 3.266586 2.084993 1.226486
    

    The barplot of this looks like:

    barplot(height = rn, names.arg = round(seq(-2, 2, length = 15), 2))
    

    enter image description here

    So, in your sample of 77 you would get the first value of the sequence in 1.226486, the second value in 2.084993 cases, etc. Its difficult to generate a vector as you described at the beginning, because the sequence above does not consist of integers.