Search code examples
clojure

Generate bell curve data using Clojure


How to generate a normal distribution of values using Clojure? Actually not necessarily a true normal distribution, but one that can be skewed.

As an example I would like to create a function that outputs a generated (pseudo random) number for the concentration of oxygen in air by volume. The lowest possible output should be 19.5% and the greatest possible 23.5%, while the modal value should be 20.95%. The function should work for this 'skewed normal' distribution where the lower portion of the tail has a range of 1.45% and the higher portion a range of 2.55%.


Solution

  • There is a simple way to get any distribution you want, if you can plot the function y=f(x) that describes the probability density function.

    For a Gaussian, this function is f(x)=exp( -(x-m)^2 / (2 * s^2) ) / sqrt(2pi s^2) (see https://en.wikipedia.org/wiki/Gaussian_function)

    Where m is the mean of x and s is the std dev of x.

    For a "normal" gaussian where m=0 and s=1, there are "almost never" values outside of +/-3 (exact amount left as an exercise to the reader). Given this approximation, the simplest way to get a gaussian dist is to generate an x floating point value in the interval [-3..3] AND a y value in the interval [0..1]. Then calculate f(x) as above: exp(...) etc. Then, IFF y<=f(x), use the value x as your random value. Otherwise, discard both x and y and start over.

    While this technique throws away some (or many) values, it is very simple and bulletproof.

    You can use a similar method for your "skewed gaussian" approximation, just define your own f(x) as you described. For a really simple approx, you could even use a straight line approximation from (19.5,0) to (20.95,1) to (23.5,0), where this makes a triangular shape for f(x). In this case, draw x in the interval [19.5..23.5] and calculate the straight-line formula for the left & right halves for f(x). Draw y in [0..1] as before.

    I just found that Wikipedia describes this in more detail: https://en.wikipedia.org/wiki/Rejection_sampling


    Update:

    If you just want Gaussian random variables (or other common distributions), you can use the Apache Commons Math library.