I want to create an array A [1 ,1 , 2, 2 ,2 , 5, 5 ,5 ,....] with numbers from [a,b] such that
An histogram where Y-Axis is the frequency of the number in the array and X-axis is [a,b] resembles a bell curve.
Many functions are available in python like numpy.random.normal
or scipsy.stats.truncnorm
but I am not able to fully understand their use and how they can help me to create such an array.
The first point is easy, for the second point, I'm assuming you want the "integral" of freq * x to be close to K (making each x * freq(x) ~ K is mathematically impossible). You can do that by adjusting sample size.
First step: bell curve shaped integer numbers between a
and b
, use scipy.stats.truncnorm
. From the docs:
Notes
The standard form of this distribution is a standard normal truncated to the range [a, b] --- notice that a and b are defined over the domain of the standard normal. To convert clip values for a specific mean and standard deviation, use::
a, b = (myclip_a - my_mean) / my_std, (myclip_b - my_mean) / my_std
Take a normal in the -3, 3 range, so the curve is nice. Adjust mean and standard deviation so -3, 3 becomes a, b:
from scipy.stats import truncnorm
a, b = 10, 200
loc = (a + b) / 2
scale = (b - a) / 6
n = 100
f = truncnorm(-3,3, loc=(a+b)/2,scale=(b-a)/6)
Now, since frequency is related to the probability density function: sum(freq(i) * i ) ~ n * sum(pdf(i) * i). Therefore, n = K / sum(pdf(i) * i). This can be obtained as:
K = 200000
i = np.arange(a, b +1)
n = int(K / i.dot(f.pdf(i)))
Now generate integer random samples, and check function:
samples = f.rvs(size=n).astype(np.int)
import matplotlib.pyplot as plt
plt.hist(samples, bins = 20)
print(np.histogram(samples, bins=b-a+1)[0].dot(np.arange(a,b+1)))
>> 200315