I want to check the fit of my data, which I suspect is lognormally distributed using a histogram and overlaying the lognormal PDF as a line. I estimate the lognormal parameters from the data and generate n=1000 data points (same number as the data). data_list is a list containing 1000 of my datapoints which are integers.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
...
data = np.array(data_list)
plt.hist(data, bins=32, density=True, alpha=0.6, color='g', label='Data')
sigma, _, mu = lognorm.fit(np.log(data), floc=0)
x = np.linspace(min(data), max(data), 1000)
lognormal_data = lognorm.pdf(x, sigma, scale=np.exp(mu))
plt.plot(x, lognormal_data, 'r-', lw=2, label='Lognormal Distribution')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.legend()
plt.title('Histogram Overlay with Lognormal Distribution')
plt.grid(True)
plt.show()
However, the resulting plot is this:
It seems like the initial parameters for the lognormal distribution ar off, as it does not coincide with the data. Furthermore, the curve looks more normal than lognormal. Does anybody see what i'm doing wrong here>
I'm no statistician, but if you suspect that data
has a lognormal distribution, shouldn't you try to fit data
instead of np.log(data)
?
The documentation of the fit method states that it returns the following:
Estimates for any shape parameters (if applicable), followed by those for location and scale.
The same documentation states that lognorm.pdf
has the following signature: pdf(x, s, loc=0, scale=1)
.
I would therefore try the following:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
data = np.random.lognormal(mean=1, sigma=0.2, size=1000)
plt.hist(data, bins=50, density=True, alpha=0.6, color='g', label='Data')
s, loc, scale = lognorm.fit(data)
x = np.linspace(min(data), max(data), 1000)
lognormal_data = lognorm.pdf(x, s, loc=loc, scale=scale)
plt.plot(x, lognormal_data, 'r-', lw=2, label='Lognormal Distribution')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.legend()
plt.title('Histogram Overlay with Lognormal Distribution')
plt.grid(True)
plt.show()
Output: