So I am trying to implement a function to generate random number following log-normal distribution myself. (Despite I am testing it in numpy, I need it for somewhere else where out of the box function is unavailable).
Using Box-Muller transform to generate Normally distributed random number and taking exponential of it (suggested in this question), and referring to Wikipedia page I come up with this code:
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng()
def f():
mean = 100
sigma = 50
u = np.log((mean * mean) / np.sqrt(mean * mean + sigma * sigma))
d = np.log(1 + (sigma * sigma) / (mean * mean))
u1 = 0
while u1 <= 0.0000001:
u1 = rng.random()
u2 = rng.random()
mag = d * np.sqrt(-2 * np.log(u1))
z0 = mag * np.cos(2 * np.pi * u2) + u
z1 = mag * np.sin(2 * np.pi * u2) + u
return (np.exp(z0), np.exp(z1))
iteration = 100000
data = []
for i in range(0, iteration):
x = f()
data.append(x[0])
data.append(x[1])
fig, ax = plt.subplots()
ax.hist(data, bins=100)
print('mean: ' + str(np.mean(data)))
print('median: ' + str(np.median(data)))
print('stdev: ' + str(np.std(data)))
However, the generated values does not have the right mean and stdev, even though the shape seems to be vaguely correct. Does I miss anything? Or is it just some floating point shenanigans?
So problem is getting log-normal with specific mean and stddev values. To do that you have to get log-normal distribution parameters μ, σ from mean and stddev (denoted m and s). THen you sample N(0,1) via Box-Muller. Then you make them N(μ, σ), simple linear transformation. And at last you take exp() of sampled values because log-normal could be seen as distribution for logarithms.
Code, Win 10 x64, Python 3.10
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(135797537)
def squared(x: np.float64) -> np.float64:
return x*x
def sample_normal_BM() -> tuple[np.float64, np.float64]:
"""Sample two normal RVs using Box-Muller method"""
u = 1.0 - rng.random() # guaranteed (0...1]
v = rng.random()
r = np.sqrt(-2.0 * np.log(u))
p = 2.0 * np.pi * v
return r*np.cos(p), r*np.sin(p)
def calc_μσ(m: np.float64, s: np.float64) -> tuple[np.float64, np.float64]:
"""From desired mean and stddev calculate μ,σ for lognormal"""
σ = np.sqrt(np.log((squared(s) + squared(m))/squared(m)))
μ = np.log(m) - squared(σ)/2
return μ, σ
def sample_lognormal(μ: np.float64, σ: np.float64) -> tuple[np.float64, np.float64]:
n1, n2 = sample_normal_BM()
n1 = σ*n1 + μ
n2 = σ*n2 + μ
return np.exp(n1), np.exp(n2)
m = 100.0 # desired mean
s = 50.0 # desired stddev
μ, σ = calc_μσ(m, s)
N = 100000
data = []
for i in range(0, N):
ln1, ln2 = sample_lognormal(μ, σ)
data.append(ln1)
data.append(ln2)
fig, ax = plt.subplots()
ax.hist(data, bins=100)
print('mean: ' + str(np.mean(data)))
print('median: ' + str(np.median(data)))
print('stdev: ' + str(np.std(data)))
Output should produce something like
and print out value which are quite close to desired
mean: 100.00006076705726
median: 89.3329385030017
stdev: 50.09847133246042