Specifiying range for log-normal distribution in Python

I am generating a random log-normal distribution using the inbuilt function. Is it possible to specify a range for a given mean and sigma? For instance, I want a random distribution between 0 and 0.5 with mean=0.2, sigma=0.5?

import numpy as np 

r=np.random.lognormal(mean=0.2, sigma=0.5, size=(3,3))
print([r])

Solution

Not directly, but you can generate more points and filter those that are outside your parameters. With the values you provided, however, there will be very few such points:

np.random.seed(0)
z = np.random.lognormal(mean=0.2, sigma=0.5, size=10_000)

>>> np.sum(z < 0.5) / len(z)
0.0346

As a side note, please know that the parameters mean and sigma of np.random.lognormal() refer to the underlying normal distribution, not to the mean and std of the log-normal points:

np.random.seed(0)
z = np.random.lognormal(mean=0.2, sigma=0.5, size=1000)
y = np.log(y)

>>> np.mean(z), np.std(z)
(1.3886515119063163, 0.7414709414626542)

>>> np.mean(y), np.std(y)
(0.2018986489272414, 0.5034218384643446)

All that said, if you really want what you asked for, you could do:

shape = 3, 3
zmin, zmax = 0, 0.5

n = np.prod(shape)
zc = np.array([])
while True:
    z = np.random.lognormal(mean=0.2, sigma=0.5, size=n * 100)
    z = z[(zmin <= z) & (z < zmax)]
    zc = np.r_[zc, z]
    if len(zc) >= n:
        break
r = zc[:n].reshape(shape)

>>> r
array([[0.34078793, 0.45366409, 0.40183988],
       [0.43387773, 0.46387512, 0.30535007],
       [0.44248787, 0.32316698, 0.48600577]])

Note, however, that the loop above (which in this case is done on average just once), may run forever if your bounds specify an empty space.

Edit

For efficiency, we can use instead scipy.stats.truncnorm. However, there seems to be an issue when loc, scale != 0, 1, so here we roll out our own adjustment:

# convert X ~ N(0, 1) to Y ~ N(u, s):
# Y = X * s + u
# want Y in [a, b]:
# so X in [(a - u)/s, (b - u)/s]
#
# We want Z ~ logN(u, s) in [zmin, zmax]
# Z = exp(Y) and
# a, b = np.log((zmin, zmax))

def lognormal(a, b, loc=1, scale=1, size=1):
    a, b = np.log((a, b))
    ca, cb = (np.array((a, b)) - loc) / scale
    rv = truncnorm(a=ca, b=cb)
    z = np.exp(rv.rvs(size=size) * scale + loc)
    return z

Example:

zmin, zmax = 0, 0.5
z = lognormal(zmin, zmax, 0.2, 0.5, 10_000)

Or, to capture the rather noisy warning from np.log(0):

import warnings

with warnings.catch_warnings():
    warnings.filterwarnings('ignore', category=RuntimeWarning, message='divide by zero encountered in log') 
    z = lognormal(zmin, zmax, 0.2, 0.5, 10_000)

Either way, we now have:

np.min(z), np.max(z)
(0.10787873738453256, 0.4999993445360746)

And the same distribution than one would get with the above (r) with shape = 10_000:

def lognormal_naive(a, b, loc=1, scale=1, size=1):
    zc = np.array([])
    while True:
        z = np.random.lognormal(mean=loc, sigma=scale, size=n * 100)
        z = z[(a <= z) & (z < b)]
        zc = np.r_[zc, z]
        if len(z) >= size:
            break
    return zc[:size]

r = lognormal_naive(zmin, zmax, 0.2, 0.5, 10_000)

fig, ax = plt.subplots(ncols=2, figsize=(7, 2))
ax[0].hist(z, bins=100)
ax[1].hist(r, bins=100);

However, the timing is much better:

%timeit lognormal_naive(zmin, zmax, 0.2, 0.5, 10_000)
64.2 ms ± 60.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit lognormal(zmin, zmax, 0.2, 0.5, 10_000)
2.06 ms ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)