Search code examples
pythoncurve-fittinglmfit

Fitting "multimodal" lognormal distributions to data using python


I have the following data measured using an instrument in the lab. Since the instrument collects particles of different sizes in bins based upon their diameter the measurements are essentially "binned":

import numpy as np
import matplotlib.pylab as plt
from lmfit import models

y = np.array([196, 486, 968, 2262, 3321, 4203, 15072, 46789, 95201, 303494, 421484, 327507, 138931, 27973])
bins = np.array([0.0150, 0.0306, 0.0548, 0.0944, 0.1540, 0.2560, 0.3830, 0.6050, 0.9510, 1.6400, 2.4800, 3.6700, 5.3800, 9.9100, 15])

bin_width=np.diff(bins)
x_plot = np.add(bins[:-1],np.divide(bin_width,2))
x=x_plot
y=y

enter image description here

When plotted here is how the data look. There is one mode around 0.1 and another mode around 2 in the units of the x-scale.

Within this research area it is common to fit "multimodal" lognormal distributions to such data: Given this I have fitted the mode around 2 using LMFIT:

model = models.LognormalModel()
params = model.make_params(center=1.5, sigma=0.6, amplitude=2214337)

result = model.fit(y, params, x=x)
print(result.fit_report())

plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.xscale("log")
plt.yscale("log")
plt.legend()
plt.show()

enter image description here

As expected this results in a good fit for the second mode around 2. My question is how would I also go about fitting a second mode around 0.1 (essentially a sum of the two modes should fit the data)? I realise it could also be argued that three modes would be better but I assume once I understand how to use two modes, the addition of a third should be trivial.


Solution

  • lmfit.Models can be added together, as with:

    model = (models.LognormalModel(prefix='p1_') +
             models.LognormalModel(prefix='p2_') +
             models.LognormalModel(prefix='p3_') )
    
    params = model.make_params(p1_center=0.3, p1_sigma=0.2, p1_amplitude=1e4,
                               p2_center=1.0, p2_sigma=0.4, p2_amplitude=1e6,
                               p3_center=1.5, p3_sigma=0.6, p3_amplitude=2e7)
    

    In a composite model, each component of the Model gets its own "prefix" (any string) that prepends the parameter names. You can get a dictionary of a Models components after the fit with:

    components = result.eval_components()
    # {'p1_': Array, 'p2_': Array, 'p3_': Array}
    for compname, comp in components.keys():
        plt.plot(x, comp, label=compname)
    

    For fitting data that you would represent on a semi-log or log-log plot, you might consider fitting a model to log(y). Otherwise, the fit will not be very sensitive to misfit at very low values of y.

    Note that lmfit models and parameters support bounds. You may want to or find that you need to place bounds such as

    params['p1_amplitude'].min = 0
    params['p1_sigma'].min = 0
    params['p1_center'].max = 0.5
    params['p3_center'].min = 1.0