Search code examples
pythoncurve-fittingdifferential-equationsgekko

Fitting two populations to measurements using GEKKO how to optimize the first data point


I have this code to try to fit the sum of two populations to a measurements data-series.

dS/dt = (a - b) * S
dR/dt = (a - b - c)* R
X(t) = S(t) + R(t)

I want also to optimize the first point which instead now is fixed, I had thought to use a new variable called f which stand in between 0 and 1 and that represents the initial ratio between population 1 and population 2. Something like this:

S0 = x_meas[0] * f
R0 = x_meas[0] * (1-f)

Here is the code:

from gekko import GEKKO
m = GEKKO(remote=False)

# data
m.time = [0,0.1,0.2,0.3,0.4,0.5,0.8,0.85,0.9,0.95,1]
x_meas = [1.1,1.2,1.56,1.77,1.89,2.1,2.3,2.5,2.7,2.2,2.1]

# parameters, single value over time horizon
p = m.Array(m.FV,3,lb=0.5,ub=10); a,b,c = p
# turn STATUS On for adjustment by solver
for fv in p:
   fv.STATUS=1

f0 = m.FV(value=0.01, lb=0, ub=1)
f0.STATUS=1
# variables
S,R,X = m.Array(m.Var,3,lb=0)
# change initial conditions
S0,R0 = [0.35,0.65]; X0 = S0+R0
S.value=S0; R.value=R0; X.value=X0

# measured values
Xm = m.Param(x_meas)

# equations
m.Equations([S.dt()==(a-b)*S,
             R.dt()==(a-b-c)*R,
             X==S+R])

# minimize difference to measurements
m.Minimize((X-Xm)**2)

m.options.IMODE = 5   # dynamic estimation
m.options.NODES = 3   # collocation nodes
m.solve(disp=False)   # display solver output

print(f'a: {a.value[0]}, b: {b.value[0]}, c: {c.value[0]}')

import matplotlib.pyplot as plt
plt.figure(figsize=(7,3))
plt.plot(m.time,S.value,'b:',linewidth=3,label='S')
plt.plot(m.time,R.value,'r--',linewidth=2,label='R')
plt.plot(m.time,X.value,'k--',label='X')
plt.plot(m.time,Xm.value,'kx',label='Xm')
plt.legend(); plt.grid(); plt.xlabel('Time')
plt.tight_layout()
plt.savefig('fit.png',dpi=300)
plt.show()

Solution

  • By default, gekko is set up as an initial value problem where the values at t=0 are fixed and the equations are not solved at that initial point. One way to still fit into this framework with calculated initial values is to set fixed_initial=False and the differential equation initial condition is calculated. However, this doesn't fix the issue that the equations for the first node in the time horizon are deactivated.

    To overcome this, try setting another very small time increment in the time horizon with a very small time-step, such as 1e-5. Create the fraction f as an FV type and set the STATUS=1 to calculate that single value with the optimizer. I included bounds on f between 0.2 and 0.8, just to make the problem more interesting and they can be changed.

    # initial condition variables
    f = m.FV(value=0.5,lb=0.2,ub=0.8)
    f.STATUS = 1
    

    Activate the equations for the relation between R, S and f only on that small-time step.

    # activate only for initial condition
    init = np.zeros(len(m.time)); init[1]=1
    init = m.Param(init)
    m.Equations([init*(S-x_meas[0]*f)==0,
                 init*(R-x_meas[0]*(1-f))==0])
    

    The remainder of the problem stays the same.

    Fit to data

    from gekko import GEKKO
    import numpy as np
    m = GEKKO(remote=True)
    
    # data
    m.time = [0,1e-5,0.1,0.2,0.3,0.4,0.5,0.8,0.85,0.9,0.95,1]
    x_meas = [1.1,1.1,1.2,1.56,1.77,1.89,2.1,2.3,2.5,2.7,2.2,2.1]
    
    # parameters, single value over time horizon
    p = m.Array(m.FV,3,lb=0.5,ub=10); a,b,c = p
    # turn STATUS On for adjustment by solver
    for fv in p:
       fv.STATUS=1
    
    # variables
    S,R,X = m.Array(m.Var,3,value=0.5,lb=0,fixed_initial=False)
    
    # initial condition variables
    f = m.FV(value=0.5,lb=0.2,ub=0.8)
    f.STATUS = 1
    
    # activate only for initial condition
    init = np.zeros(len(m.time)); init[1]=1
    init = m.Param(init)
    m.Equations([init*(S-x_meas[0]*f)==0,
                 init*(R-x_meas[0]*(1-f))==0])
    
    # change initial condition for X
    X.value=x_meas[0]
    
    # measured values
    Xm = m.Param(x_meas)
    
    # equations
    m.Equations([S.dt()==(a-b)*S,
                 R.dt()==(a-b-c)*R,
                 X==S+R])
    
    # minimize difference to measurements
    m.Minimize((X-Xm)**2)
    
    m.options.IMODE = 5   # dynamic estimation
    m.options.NODES = 2   # collocation nodes
    m.options.SOLVER = 1  # APOPT solver
    m.solve(disp=True)    # display solver output
    
    print(f'a: {a.value[0]}, b: {b.value[0]}, c: {c.value[0]}')
    print(f'S0: {S.value[0]}, R0: {R.value[0]}, f:{f.value[0]}')
    
    import matplotlib.pyplot as plt
    plt.figure(figsize=(7,3))
    plt.plot(m.time,S.value,'b:',linewidth=3,label='S')
    plt.plot(m.time,R.value,'r--',linewidth=2,label='R')
    plt.plot(m.time,X.value,'k--',label='X')
    plt.plot(m.time,Xm.value,'kx',label='Xm')
    plt.legend(); plt.grid(); plt.xlabel('Time')
    plt.tight_layout()
    plt.savefig('fit.png',dpi=300)
    plt.show()
    

    It solves with a successful solution:

     Iter    Objective  Convergence
       20  1.01871E+00  1.15106E-04
       21  1.01863E+00  1.02840E-04
       22  1.01863E+00  3.60679E-09
       23  1.01863E+00  5.74752E-10
       24  1.01863E+00  5.74752E-10
     Successful solution
     
     ---------------------------------------------------
     Solver         :  APOPT (v1.0)
     Solution time  :   6.410000000323635E-002 sec
     Objective      :    1.01863407387195     
     Successful solution
     ---------------------------------------------------
     
    a: 1.6769964544, b: 0.50197865393, c: 0.5
    S0: 0.21999741496, R0: 0.87999405984, f:0.2