Search code examples
pythonscipycurve-fittingdifferential-equationsscipy-optimize

Fitting curve where each point is the solution of an ODE


I have the following code:

import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import random

def dose(y, t, b, s, c, p, i):
    target, infectious, virus = y
    dydt = [-b*target*virus, b*target*virus - s*infectious, (1/(i+1))*p*infectious - c*virus]
    return dydt


b = 0.00001
s = 4
c = 4
p = 2000000
D = np.logspace(-3, 3, 7)
mylist = []

y0 = [1, 0, 0.01]
t = np.linspace(0, 60, 1000)
for i in D:
    sol = odeint(dose, y0, t, args=(b, s, c, p, i))
    #plt.plot(t, sol[:, 0], label='D = ' + str(i))
    V = sol[:, 2]
    mylist.append(V[48]/0.01950269536785707)


def add_noise(d, noise_pct):
    return [x + random.gauss(0, noise_pct * x) for x in d]

mat = np.array(mylist)

mylist2 = add_noise(mylist, 0.05)
mat2 = np.array(mylist2)
plt.plot(D, mat2)

plt.xscale('log')
#plt.plot(t, sol[:, 2], 'r', label='virus')
plt.legend(loc='best')
plt.xlabel('time')
plt.grid()
#plt.show()
print(D)
print(mat2)

Now D and mat2 contain the x and y values that I would like to use to fit to the solution of my original system of ODEs in the function dose. I would like to plot the new fitted function as well.

How would I do this?

To clarify, I added an error varaition of 5% to 7 data points from the original graph and got a np.array of x and y values that I want to find a curve of best fit for, which follows the same differential equations as the original graph.


Solution

  • Lets reformulate a bit your problem to make it more efficient and fittable.

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import integrate, optimize
    

    We write the dynamic of your system to be compliant with new solver interface solve_ivp:

    def dose(t, y, b, s, c, p, d):
        target, infectious, virus = y
        return np.array([
            -b * target * virus,
            b * target * virus - s * infectious,
            (1. / (d + 1.)) * p * infectious - c * virus
        ])
    

    Now we write the model function to have a signature compatible with curve_fit, we also operate simplification in solving the ODE in order to improve time complexity of the algorithm:

    def model(D, b, s, c, p):
        solutions = []
        for d in D:
            solution = integrate.solve_ivp(
                dose, [0, 5], y0=[1, 0, 0.01],
                t_eval=[2.8828828828828827], # Translating np.linspace(0, 60, 1000)[48]
                args=(b, s, c, p, d)
            )
            data = solution.y[2, 0] / 0.01950269536785707  # I'm intrigued by this part of the your code I translated
            solutions.append(data)
        return np.array(solutions)
    

    Now we generate some synthetic dataset:

    b = 0.00001
    s = 4
    c = 4
    p = 2000000
    p0 = (b, s, c, p)
    
    np.random.seed(12345)
    D = np.logspace(-3, 3, 20)
    z = model(D, b, s, c, p)
    s = np.ones_like(z) * 0.05
    n = s * np.random.normal(size=s.size) * z
    zn = z + n
    

    And we perform the adjustment:

    popt, pcov = optimize.curve_fit(
        model, D, zn, p0=[1e-5, 1, 1, 1e6],
        method="trf", bounds=(0, np.inf),
        sigma=s, absolute_sigma=True
    )
    
    # (array([1.98458777e-05, 3.39383754e+00, 4.55115392e+00, 1.00007348e+06]),
    #  array([[ 8.35308599e-10, -3.25230641e-03,  3.73469971e-03,
    #          -5.22169803e-11],
    #         [-3.25230641e-03,  1.28672442e+04, -1.47634805e+04,
    #           2.06436797e-04],
    #         [ 3.73469971e-03, -1.47634805e+04,  1.69398903e+04,
    #          -2.36868204e-04],
    #         [-5.22169803e-11,  2.06436797e-04, -2.36868204e-04,
    #           3.31209815e-12]]))
    

    Parameters order of magnitude are alike, fitness is good:

    Dlog = np.logspace(-3, 3, 200)
    fig, axe = plt.subplots()
    axe.scatter(D, zn, label="Data")
    axe.semilogx(Dlog, model(Dlog, *popt), label="Fit")
    axe.semilogx(Dlog, model(Dlog, *p0), "--", label="Model")
    axe.legend()
    axe.grid()
    

    enter image description here

    Which shows how to technically perform the regression. Notice than errors on s and c are very high, probably explaining why we cannot recover initial parameters.

    Additionally it seems that multiple combination of parameters can lead to equivalent fitness: Regressed and original parameter gives the same curve.