Search code examples
pythonlmfit

How to fit multiple datasets which have a combination of shared and non-shared parameters


I'm trying to fit multiple dataset which should have some variables which are shared by between datasets and others which are not. However I'm unsure of the steps I need to take to do this. Below I've shown the approach that I'm trying to use (from 'Issues begin here' doesn't work, and it just for illustrative purposes).

In this answer somebody is able to share parameters arcoss datasets is there some way that this could be adapted so that I can also have some non-shared parameters?

Does anybody have an idea how I could achieve this, or could somebody suggegst a better approach to acheive the same result? Thanks.

import numpy as np
from scipy.stats import gamma
import matplotlib.pyplot as plt
import pandas as pd
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit, Model


# Create datasets to fit
a = 1.99
start = gamma.ppf(0.001, a)
stop = gamma.ppf(.99, a)
xvals = np.linspace(start, stop, 100)
yvals = gamma.pdf(xvals, a)
data_dict = {}
for dataset in range(4):
    name = 'dataset_' + str(dataset)
    rand_offset = np.random.uniform(-.1, .1)
    noise = np.random.uniform(-.05, .05,len(yvals)) + rand_offset

    data_dict[name] = yvals + noise
df = pd.DataFrame(data_dict)

# Create some non-shared parameters
non_shared_param1 = np.random.uniform(0.009, .21, 4)
non_shared_param2 = np.random.uniform(0.01, .51, 4)

# Create the independent variable
ind_var = np.linspace(.001,100,100)

# Create a model
def model_func(time, Qi, at, vw, R, rhob_cb, al, NSP1, NSP2):
    Dt = at * vw
    Dl = al * vw

    t = time

    first_bot = 8 * np.pi * t * rhob_cb
    sec_bot = np.sqrt((np.pi * (Dl * R) * t))
    exp_top = R * np.power((NSP1 - ((t * vw)/R)), 2)
    exp_bot = 4 * Dl * t
    exp_top2 = R * np.power(NSP2, 2)
    exp_bot2 = 4 * Dt * t
    return (Qi / first_bot * sec_bot) * np.exp(- (exp_top / exp_bot) - (exp_top2 / exp_bot2))

model = Model(model_func)

### Issues begin here ###

all_results = {}
index = 0
for col in df:
    # This block assigns the correct non-shared parameter for the particular fit
    nsp1 = non_shared_param1[index]
    nsp2 = non_shared_param2[index]
    index += 1

    params = Parameters()
    at = 0.1 
    al = 0.15
    vw = 10**-4
    Dt = at * vw
    Dl = al * vw

    # Non-shared parameters
    model.set_param_hint('NSP1', value = nsp1)
    model.set_param_hint('NSP2', value = nsp2)

    # Shared and varying parameters
    model.set_param_hint('vw', value =10**-4, min=10**-10)
    model.set_param_hint('at', value =0.1)
    model.set_param_hint('al', value =0.15)

    # Shared and fixed parameters
    model.set_param_hint('Qi', value = 1000, vary = True)
    model.set_param_hint('R', value = 1.7, vary = True)
    model.set_param_hint('rhob_cb', value =2895, vary = True)

    # One set of parameters should be returned
    result = model.fit(df[col], time = ind_var)

    all_results[index] = result

Solution

  • A fit with lmfit always uses a single instance of a Parameters object, it does not take multiple Parameters objects.

    In order to simultaneously fit multiple data sets with the similar models (perhaps the same mathematical model, but expecting different parameter values for each model), you need to have an objective function that concatenates the residuals from the different component models. And each of those models has to have parameters that are taken from the single instance of Parameters(), each parameter having a unique name.

    So, to fit 2 data sets with the same function (let's use Gaussian, with parameters "center", "amplitude", and "sigma"), you might define the Parameters as

    params =  Parameters()
    params.add('center_1',    5., vary=True)
    params.add('amplitude_1', 10., vary=True)
    params.add('sigma_1',    1.0, vary=True
    params.add('center_2',    8., vary=True)
    params.add('amplitude_2', 3., vary=True)
    params.add('sigma_2',    2.0, vary=True)
    

    Then use 'center_1', 'amplitude_1', and 'sigma_1' to calculate the model for the first data set and 'center_2', etc to calculate the the model for the second, perhaps as

    def residual(params, x, datasets):
        model1 = params['amplitude_1'] * gaussian(x, params['center_1'], params['sigma_1'])
        model2 = params['amplitude_2'] * gaussian(x, params['center_2'], params['sigma_2']
    
        resid1 = datasets[0] - model1
        resid2 = datasets[1] - model2
        return np.concatenate((resid1, resid2))
    
    fit = lmfit.minimize(residual, params, fcn_args=(x, datasets))
    

    As you might be able to see from this, Parameter values are independent by default. In order to share parameter values to be used in different datasets, you have to do that explicitly (as shown in the linked answer you provide).

    For example, if you want to require the sigma values to be the same, would not change the residual function, just the Parameter definition above to:

    params.add('sigma_2', expr='sigma_1')
    

    You could require the two amplitudes to add to some value:

    params.add('amplitude_2', expr='10 - amplitude_1')
    

    or perhaps you would want to ensure that 'center_2' is larger than 'center_1', but by an amount to be determined in the fit:

    params.add('center_offset', value=0.5, min=0)
    params.add('center_2',  expr='center_1 + center_offset')
    

    Those are all ways to tie parameter values. By default, they're independent. Of course, you can also have some parameters that get used in all the models (say, just call the parameter 'sigma' and use it in for all models).