Search code examples
randomprobabilitynormal-distribution

Approximating Normal Distribution by adding Random Numbers


I would like to generate some random numbers which are normally distributed. It’s not mission critical, so a simple algorithm will suffice. I would then like to supply my own mean and standard deviation.

From what I have been able to read, according to the Central Limit Theorem, I should be able to approximate normally distributed random numbers by adding random numbers together.

For example:

rand()+rand()+rand()+rand()+rand()+rand()

where rand() results in an evenly distributed random number from 0 to 1 is a reasonable approximation. (I am aware that technically it’s 0 ≤ rand() < 1).

The expected mean is 6*0.5 so I get to the desired mean with something like this:

(rand()+rand()+rand()+rand()+rand()+rand()-3) + mean

but what would the standard deviation be?

Once I know that, would setting an arbitrary standard deviation simply be a matter of multiplying?

Update

Experimentally, I have found that

(rand()+rand()+rand()+rand()+rand()+rand()-3)*sqrt(2)*sd+mean

gives me a set of data with the desired standard deviation and mean. I have tested this out in a database (PostgreSQL) with a 10 million rows, using the stddev() and avg() aggregate functions, and typical results are close to within 2 decimal places which isn’t too bad.

I have no idea why sqrt(2) is involved …

Solution

OK, thanks to Severin Pappadeux below, I have an answer.

I can generate a reasonable result with:

(rand() + … + rand() - n/2) / sqrt(n/12) * sd + mean

where n is the number of rand() calls I am prepared to make.


Solution

  • From what I have been able to read, according to the Central Limit Theorem, I should be able to approximate normally distributed random numbers by adding random numbers together.

    That is a correct approach. The only problem is to carefully analyze the tails you're missing.

    Let's consider making N(0,1) - gaussian distributed with mean 0 and std.deviation of 1. Then any other gaussian N(\mu, \sigma) is just scale and shift from N(0,1).

    So, proposed algorithm for G(0,1) (which is an approximation for N(0,1)) is

    G(0,1) = U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1)
    

    where U(0,1) is uniformly distributed random number in the [0,1) range. Lets take a look at the mean.

    E(G(0,1)) = 6*E(U(1,0)) = 6*0.5 = 3
    

    which is exactly what you've got. So, to get 0 mean for G(0,1) we have to subtract 3. Lets now check the variance of the G(0,1), we have to make it equal to 1.

    V(G(0,1)) = 6*V(U(1,0)) = 6*(1/12) = 1/2
    

    Std.deviation (σ) is square root of variance, so to get it to 1 you have to divide by sqrt(1/2).

    So, final expression would be

    G(0,1) = (U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) - 3)/sqrt(1/2)
    

    and it is reasonably good approximation of the N(0,1).

    I have no idea why sqrt(2) is involved …

    Dividing by sqrt(1/2) is the same as multiplying by sqrt(2) - now I hope you know where it came from.

    Some simple corollary - for some other n sum of U(0,1) variance multiplier will include term sqrt(n/12).

    Another simple corollary - because V(U(0,1)) is equal to 1/12, then summing twelve U(0,1) will not require any multipliers

    G(0,1) = Sum_1^12 U(0,1) - 6
    

    is actually often cited in some old sampling recipes books/papers.

    You might also want to take a look at related Irwin-Hall distribution and Bates distribution

    UPDATE

    I've thought about some simplification of the approach. Suppose we want to sum even number of U(0,1), so n=2m. Again, talking about G(0,1) as an approximation for N(0,1)

    G(0,1) = (Sum_1^2m U(0,1) - m ) / sqrt(m/6)
    

    Let's rewrite it as

    G(0,1) = (Sum_1^m U(0,1) - (m - Sum_1^m U(0,1)))/sqrt(m/6) =
           = (Sum_1^m U(0,1) - Sum_1^m(1 - U(0,1)))/sqrt(m/6)
    

    Due to the fact, that 1 - U(0,1) has the same distribution as U(0,1) we could write G(0,1) in symmetric form

    G(0,1) = (Sum_1^m U(0,1) - Sum_1^m U(0,1))/sqrt(m/6) =
           = Sum_1^m (U(0,1) - U(0,1)) / sqrt(m/6)