python python-3.x machine-learning artificial-intelligence data-analysis

How to fit multiple exponential curves using python

For a single exponential curve such as shown in the image here curve_fit for as single exponential curve , I am able to fit the data using scipy.optimize.curve_fit. However, I am unsure on how to realize a fit for similar dataset composed of multiple exponential curves as shown here double exponential curves. I achieved the fit for the single curve using the following approach:

def exp_decay(x,a,r):
    return a * ((1-r)**x) 

x = np.linspace(0,50,50)
y = exp_decay(x, 400, 0.06)

y1 = exp_decay(x, 550, 0.06)      # this is to be used to append to y to generate two curves

pars, cov = curve_fit(exp_decay, x, y, p0=[0,0])
plt.scatter(x,y)
plt.plot(x, exp_decay(x, *pars), 'r-')     #this realizes the fit for a single curve

yx = np.append(y,y1)   #this realizes two exponential curves (as shown above - double exponential curves) for which I don't need to fit a model to

Can someone help describe how to achieve this for a dataset of two curves. My actual dataset comprises of multiple exponential curves but I think if I can realize a fit for two curves, I may be able to replicate same for my dataset. This must not be done with scipy's curve_fit; any implementation that works is fine.

PLEASE HELP !!!

Solution

Your problem can easily be tackled by splitting your dataset using a simple criterion such as first derivative estimate and then we can apply simple curve fitting procedure to each sub dataset.

Trial Dataset

First, let's import some packages and create a synthetic dataset with three curves to represent your problem.

We use a two parameters exponential model as time origin shift will be handled by the splitting methodology. We also add noise as there is always noise on real world data:

import numpy as np
import pandas as pd
from scipy import optimize
import matplotlib.pyplot as plt

def func(x, a, b):
    return a*np.exp(b*x)

N = 1001
n1 = N//3
n2 = 2*n1

t = np.linspace(0, 10, N)

x0 = func(t[:n1], 1, -0.2)
x1 = func(t[n1:n2]-t[n1], 5, -0.4)
x2 = func(t[n2:]-t[n2], 2, -1.2)

x = np.hstack([x0, x1, x2])
xr = x + 0.025*np.random.randn(x.size)

Graphically it renders as follow:

Dataset Splitting

We can split the dataset into three sub-datasets using a simple criterion as first derivative estimate using first difference to assess it. The goal is to detect when curve drastically goes up or down (where dataset should be split. First derivative is estimated as follow):

dxrdt = np.abs(np.diff(xr)/np.diff(t))

The criterion requires an extra parameter (threshold) that must be tuned accordingly to your signal specifications. The criterion is equivalent to:

xcrit = 20
q = np.where(dxrdt > xcrit) # (array([332, 665], dtype=int64),)

And split index are:

idx = [0] + list(q[0]+1) + [t.size] # [0, 333, 666, 1001]

Mainly the criterion threshold will be affected by the nature and the power of the noise on your data and the gap magnitudes between two curves. The usage of this methodology depends on the ability to detect curves gap in presence of noise. It will break when the noise power has the same magnitude of the gap we want to detect. You can also observe false split index if the noise is heavily tailed (few strong outliers).

In this MCVE, we have set the threshold to 20 [Signal Units/Time Units]:

An alternative to this hand-crafted criterion is to delegate the identification to the excellent find_peaks method of scipy. But it will not avoid the requirement to tune the detection to your signal specifications.

Fit origin-shifted dataset

Now we can apply the curve fitting on each sub-dataset (with origin shifted time), collect parameters and statistics and plot the result:

trials = []
fig, axe = plt.subplots()
for k, (i, j) in enumerate(zip(idx[:-1], idx[1:])):
    p, s = optimize.curve_fit(func, t[i:j]-t[i], xr[i:j])
    axe.plot(t[i:j], xr[i:j], '.', label="Data #{}".format(k+1))
    axe.plot(t[i:j], func(t[i:j]-t[i], *p), label="Data Fit #{}".format(k+1))
    trials.append({"n0": i, "n1": j, "t0": t[i], "a": p[0], "b": p[1],
                   "s_a": s[0,0], "s_b": s[1,1], "s_ab": s[0,1]})
axe.set_title("Curve Fits")
axe.set_xlabel("Time, $t$")
axe.set_ylabel("Signal Estimate, $\hat{g}(t)$")
axe.legend()
axe.grid()
df = pd.DataFrame(trials)

It returns the following fitting results:

    n0    n1    t0         a         b       s_a           s_b      s_ab
0    0   333  0.00  0.998032 -0.199102  0.000011  4.199937e-06 -0.000005
1  333   666  3.33  5.001710 -0.399537  0.000013  3.072542e-07 -0.000002
2  666  1001  6.66  2.002495 -1.203943  0.000030  2.256274e-05 -0.000018

Which complies with our original parameters (see Trial dataset section).

Graphically we can check the goodness of fits: