Search code examples
python-3.xscipynormal-distribution

creating an array of numbers whose frequency resembles bell curve


I want to create an array A [1 ,1 , 2, 2 ,2 , 5, 5 ,5 ,....] with numbers from [a,b] such that

  • An histogram where Y-Axis is the frequency of the number in the array and X-axis is [a,b] resembles a bell curve.

    Bell Curve

  • The sum of frequency(i)*i for all i in [a,b] is approximately around a large number K

Many functions are available in python like numpy.random.normal or scipsy.stats.truncnorm but I am not able to fully understand their use and how they can help me to create such an array.


Solution

  • The first point is easy, for the second point, I'm assuming you want the "integral" of freq * x to be close to K (making each x * freq(x) ~ K is mathematically impossible). You can do that by adjusting sample size.

    First step: bell curve shaped integer numbers between a and b, use scipy.stats.truncnorm. From the docs:

    Notes

    The standard form of this distribution is a standard normal truncated to the range [a, b] --- notice that a and b are defined over the domain of the standard normal. To convert clip values for a specific mean and standard deviation, use::

    a, b = (myclip_a - my_mean) / my_std, (myclip_b - my_mean) / my_std
    

    Take a normal in the -3, 3 range, so the curve is nice. Adjust mean and standard deviation so -3, 3 becomes a, b:

    from scipy.stats import truncnorm
    a, b = 10, 200
    loc = (a + b) / 2
    scale = (b - a) / 6
    n = 100
    f = truncnorm(-3,3, loc=(a+b)/2,scale=(b-a)/6)
    

    Now, since frequency is related to the probability density function: sum(freq(i) * i ) ~ n * sum(pdf(i) * i). Therefore, n = K / sum(pdf(i) * i). This can be obtained as:

    K = 200000
    i = np.arange(a, b +1)
    n = int(K / i.dot(f.pdf(i)))
    

    Now generate integer random samples, and check function:

    samples = f.rvs(size=n).astype(np.int)
    import matplotlib.pyplot as plt
    plt.hist(samples, bins = 20)
    print(np.histogram(samples, bins=b-a+1)[0].dot(np.arange(a,b+1)))
    
    >> 200315
    

    enter image description here