Search code examples
pythonnumpyrandom

Generating numbers in log-normal distribution not giving right parameters in return?


So I am trying to implement a function to generate random number following log-normal distribution myself. (Despite I am testing it in numpy, I need it for somewhere else where out of the box function is unavailable).

Using Box-Muller transform to generate Normally distributed random number and taking exponential of it (suggested in this question), and referring to Wikipedia page I come up with this code:


import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng()

def f():
    mean = 100
    sigma = 50

    u = np.log((mean * mean) / np.sqrt(mean * mean + sigma * sigma))
    d = np.log(1 + (sigma * sigma) / (mean * mean))

    u1 = 0
    while u1 <= 0.0000001:
        u1 = rng.random()
    u2 = rng.random()

    mag = d * np.sqrt(-2 * np.log(u1))
    z0 = mag * np.cos(2 * np.pi * u2) + u
    z1 = mag * np.sin(2 * np.pi * u2) + u

    return (np.exp(z0), np.exp(z1))

iteration = 100000

data = []
for i in range(0, iteration):
    x = f()
    data.append(x[0])
    data.append(x[1])

fig, ax = plt.subplots()
ax.hist(data, bins=100) 

print('mean: ' + str(np.mean(data)))
print('median: ' + str(np.median(data)))
print('stdev: ' + str(np.std(data)))

However, the generated values does not have the right mean and stdev, even though the shape seems to be vaguely correct. Does I miss anything? Or is it just some floating point shenanigans?

matplotlib histogram plot


Solution

  • So problem is getting log-normal with specific mean and stddev values. To do that you have to get log-normal distribution parameters μ, σ from mean and stddev (denoted m and s). THen you sample N(0,1) via Box-Muller. Then you make them N(μ, σ), simple linear transformation. And at last you take exp() of sampled values because log-normal could be seen as distribution for logarithms.

    Code, Win 10 x64, Python 3.10

    import numpy as np
    import matplotlib.pyplot as plt
    
    rng = np.random.default_rng(135797537)
    
    def squared(x: np.float64) -> np.float64:
        return x*x
    
    def sample_normal_BM() -> tuple[np.float64, np.float64]:
        """Sample two normal RVs using Box-Muller method"""
    
        u = 1.0 - rng.random() # guaranteed (0...1]
        v = rng.random()
    
        r = np.sqrt(-2.0 * np.log(u))
        p = 2.0 * np.pi * v
    
        return r*np.cos(p), r*np.sin(p)
    
    def calc_μσ(m: np.float64, s: np.float64) -> tuple[np.float64, np.float64]:
        """From desired mean and stddev calculate μ,σ for lognormal"""
    
        σ = np.sqrt(np.log((squared(s) + squared(m))/squared(m)))
        μ = np.log(m) - squared(σ)/2
        
        return μ, σ
    
    def sample_lognormal(μ: np.float64, σ: np.float64) -> tuple[np.float64, np.float64]:
        n1, n2 = sample_normal_BM()
        
        n1 = σ*n1 + μ
        n2 = σ*n2 + μ
        
        return np.exp(n1), np.exp(n2)
    
    m = 100.0 # desired mean
    s = 50.0  # desired stddev
    
    μ, σ = calc_μσ(m, s)
    
    N = 100000
    
    data = []
    for i in range(0, N):
        ln1, ln2 = sample_lognormal(μ, σ)
        data.append(ln1)
        data.append(ln2)
    
    fig, ax = plt.subplots()
    ax.hist(data, bins=100) 
    
    print('mean: ' + str(np.mean(data)))
    print('median: ' + str(np.median(data)))
    print('stdev: ' + str(np.std(data)))
    

    Output should produce something like enter image description here

    and print out value which are quite close to desired

    mean: 100.00006076705726
    median: 89.3329385030017
    stdev: 50.09847133246042