Search code examples
pythonnumpydistribution

Specifiying range for log-normal distribution in Python


I am generating a random log-normal distribution using the inbuilt function. Is it possible to specify a range for a given mean and sigma? For instance, I want a random distribution between 0 and 0.5 with mean=0.2, sigma=0.5?

import numpy as np 

r=np.random.lognormal(mean=0.2, sigma=0.5, size=(3,3))
print([r])

Solution

  • Not directly, but you can generate more points and filter those that are outside your parameters. With the values you provided, however, there will be very few such points:

    np.random.seed(0)
    z = np.random.lognormal(mean=0.2, sigma=0.5, size=10_000)
    
    >>> np.sum(z < 0.5) / len(z)
    0.0346
    

    As a side note, please know that the parameters mean and sigma of np.random.lognormal() refer to the underlying normal distribution, not to the mean and std of the log-normal points:

    np.random.seed(0)
    z = np.random.lognormal(mean=0.2, sigma=0.5, size=1000)
    y = np.log(y)
    
    >>> np.mean(z), np.std(z)
    (1.3886515119063163, 0.7414709414626542)
    
    >>> np.mean(y), np.std(y)
    (0.2018986489272414, 0.5034218384643446)
    

    All that said, if you really want what you asked for, you could do:

    shape = 3, 3
    zmin, zmax = 0, 0.5
    
    n = np.prod(shape)
    zc = np.array([])
    while True:
        z = np.random.lognormal(mean=0.2, sigma=0.5, size=n * 100)
        z = z[(zmin <= z) & (z < zmax)]
        zc = np.r_[zc, z]
        if len(zc) >= n:
            break
    r = zc[:n].reshape(shape)
    
    >>> r
    array([[0.34078793, 0.45366409, 0.40183988],
           [0.43387773, 0.46387512, 0.30535007],
           [0.44248787, 0.32316698, 0.48600577]])
    

    Note, however, that the loop above (which in this case is done on average just once), may run forever if your bounds specify an empty space.

    Edit

    For efficiency, we can use instead scipy.stats.truncnorm. However, there seems to be an issue when loc, scale != 0, 1, so here we roll out our own adjustment:

    # convert X ~ N(0, 1) to Y ~ N(u, s):
    # Y = X * s + u
    # want Y in [a, b]:
    # so X in [(a - u)/s, (b - u)/s]
    #
    # We want Z ~ logN(u, s) in [zmin, zmax]
    # Z = exp(Y) and
    # a, b = np.log((zmin, zmax))
    
    def lognormal(a, b, loc=1, scale=1, size=1):
        a, b = np.log((a, b))
        ca, cb = (np.array((a, b)) - loc) / scale
        rv = truncnorm(a=ca, b=cb)
        z = np.exp(rv.rvs(size=size) * scale + loc)
        return z
    

    Example:

    zmin, zmax = 0, 0.5
    z = lognormal(zmin, zmax, 0.2, 0.5, 10_000)
    

    Or, to capture the rather noisy warning from np.log(0):

    import warnings
    
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore', category=RuntimeWarning, message='divide by zero encountered in log') 
        z = lognormal(zmin, zmax, 0.2, 0.5, 10_000)
    

    Either way, we now have:

    np.min(z), np.max(z)
    (0.10787873738453256, 0.4999993445360746)
    

    And the same distribution than one would get with the above (r) with shape = 10_000:

    def lognormal_naive(a, b, loc=1, scale=1, size=1):
        zc = np.array([])
        while True:
            z = np.random.lognormal(mean=loc, sigma=scale, size=n * 100)
            z = z[(a <= z) & (z < b)]
            zc = np.r_[zc, z]
            if len(z) >= size:
                break
        return zc[:size]
    
    r = lognormal_naive(zmin, zmax, 0.2, 0.5, 10_000)
    
    fig, ax = plt.subplots(ncols=2, figsize=(7, 2))
    ax[0].hist(z, bins=100)
    ax[1].hist(r, bins=100);
    

    However, the timing is much better:

    %timeit lognormal_naive(zmin, zmax, 0.2, 0.5, 10_000)
    64.2 ms ± 60.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    %timeit lognormal(zmin, zmax, 0.2, 0.5, 10_000)
    2.06 ms ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)