Search code examples
javastatisticsdistribution

Given a min, max, mean and standard deviation, generate a random value from a distribution using Java


Given that min = 0.00, max = 1400.00, mean = 150.50, standard deviation = 25.00, how does one generates a random value base on these statistics? From my understanding is that this graph is a skewed graph however I am not too sure if it's a log-normal distributed. However from my understanding so far, the following piece of code returns a value that is from a normal distribution.

private static int generateValue(double mean, double stdDev) {
    return (rand.nextGaussian() * stdDev) + mean);
}

Solution

  • One way to do this is to find a naturally bounded distribution which has mean and variance characterized by two parameters. That reduces the problem from trying to meet four constraints (min, max, mean, and s.d.) simultaneously to solving two equations (for mean and s.d.) in terms of two parameters. The beta distribution meets those needs. It's defined on the range [0, 1], but that can be easily adjusted to your problem by scaling results by 1400. I used the Wikipedia link provided above to refresh my memory on the formulae for mean and variance of a beta, and then headed over to the solver at Wolfram|Alpha to enter enter the formulae using a mean of 150.5/1400 and standard deviation of 25/1400. This yielded solutions of α=32.237057 and β=267.642543, so you can fulfill your requirements by generating values X = 1400 * beta(α, β) using the derived parameter values.

    I haven't used Java for over 15 years and don't have it on my machine, so I tested this using python to confirm the parameterization:

    from scipy.stats import beta
    import math
    
    a = 32.237057
    b = 267.642543
    n = 100_000_000
    
    mean, var = beta.stats(a, b, moments='mv')
    print( f"mean = {mean * 1400}, std dev = {math.sqrt(var) * 1400}" )
    

    which produces

    mean = 150.50000000000003, std dev = 25.000000000000004

    I'd say that's about as close as you can ask for using floating point arithmetic. I then tried actual generation:

    r = beta.rvs(a, b, size=n) * 1400
    print( f"For n={n} min and max are {min(r)} and {max(r)}, respectively")
    

    with output:

    For n=100000000 min and max are 45.22697720545599 and 327.87270125710194, respectively

    You might consider the empirical maximum to be low, but note that 1400 is just shy of 50σ above the mean. Chebyshev's inequality gives a very weak non-parametric upper bound on the probability of getting such a value—it's less than 1/2500. In many cases, including this one, the actual probability is much less than Chebyshev's bound. In other words, the probability of getting an outcome approaching 1400 is essentially zero.

    A quick Google search dug up class BetaDistribution available through the Apache Commons library, so it should be straightforward for you to map this approach to Java.