Search code examples
pythonoptimizationregressionnonlinear-optimizationgekko

Multiple equations minimized in regression IMODE of gekko


I would like to minimize the difference between y measured and y predicted in two different sized datasets, using two equations that share independent variables. I am using gekko as my solver.

I get the Exception error Data arrays must have the same length, and match time discretization in dynamic problems when my two datasets have a different number of rows. However, even when I set the number of rows to be equal, I get the error duplicate variable declarations. Therefore, it leads me to believe that my optimization problem is ill-defined to solve two equations that share independent variables.

Variables: v1, v2

Parameters (ds = dataset):

ds1_p1, ds1_p2 
ds2_p1, ds2_p2, ds2_p3

Equations:

y1 = v1 * ds1_p1 + v2 * ds1_p2
y2 = v1 * v2 * ds2_p1 + v1 * ds2_p2 + v2 * ds2_p3

Minimize:

((y1_pred - y1_meas) / y1_meas)^2 
((y2_pred - y2_meas) / y2_meas)^2

My simplified code for this two-equation example is shown below (my actual code has many more equations, variables, and parameters):

# define GEKKO model
m = GEKKO()
# variables and parameters
variables = {"v1": {"val": 1, "lb": 0, "ub": 2}, "v2": {"val": 3, "lb": 1, "ub": 5}}

# create variable dictionary
v_dic = {
    var_name: m.FV(
        value=var_val["val"],
        lb=var_val["lb"],
        ub=var_val["ub"],
        name=var_name,
    )
    for var_name, var_val in variables.items()
}

# set variables to be available for optimizer
for var in v_dic.values():
    var.STATUS = 1

parameters = {
    "dataset1": {f"p{i}": np.random.random(50) for i in range(1, 3)},
    "dataset2": {f"p{i}": np.random.random(100) for i in range(1, 4)},
}

# create parameters dictionary
p_dic = {
    ds_name: {
        par_name: m.Param(value=par_val, name=par_name)
        for par_name, par_val in ds_val.items()
    }
    for ds_name, ds_val in parameters.items()
}

# v1 = 2, v2 = 4
ym = {
    "dataset1": {"ymeas": 2 * p_dic["dataset1"]["p1"] + 4 * p_dic["dataset1"]["p2"]},
    "dataset2": {"ymeas": 2 * 4 * p_dic["dataset2"]["p1"] + 2 * p_dic["dataset2"]["p2"] + 4 * p_dic["dataset2"]["p3"]},
}

# create y measured dictionary
ym_dic = {
    ds_name: m.Param(value=ds_val["ymeas"], name=f"ymeas_{ds_name}")
    for ds_name, ds_val in ym.items()
}

# create y predicted dictionary
yp_dic = {ds_name: m.Var() for ds_name in ym_dic.keys()}

# define the equation for each dataset
m.Equation(
    yp_dic["dataset1"]
    == v_dic["v1"] * p_dic["dataset1"]["p1"] + v_dic["v2"] * p_dic["dataset1"]["p2"]
)

m.Equation(
yp_dic["dataset2"] == v_dic["v1"] * v_dic["v2"] * p_dic["dataset2"]["p1"] + v_dic["v1"] * p_dic["dataset2"]["p2"] + v_dic["v2"] * p_dic["dataset2"]["p3"]
)

# minimize each equation
for d_s in yp_dic.keys():
    m.Minimize(((yp_dic[d_s] - ym_dic[d_s]) / ym_dic[d_s]) ** 2)

m.options.IMODE = 2  # regression mode

m.solve()

# print solution
print("Solution")
for name, var in v_dic.items():
    print(f"{name} == {var.value[0]}")

Solution

  • IMODE=2 is built for problems where the model equations are written once and duplicated for each row of data. Use IMODE=3 and use list comprehensions or loops to build separate equations for each data set.

    from gekko import GEKKO
    m = GEKKO()
    x = [1,2,3,4,5]
    y = [1,2,3,4,5,6,7]
    a = m.FV(); a.STATUS=1
    z = [2,4,6,8,10,12,14]
    
    [m.Minimize((z[i]-a*x[i])**2) for i in range(len(x))]
    [m.Minimize((z[i]-a*y[i])**2) for i in range(len(y))]
    
    m.options.IMODE=3
    m.solve(disp=False)
    
    print(a.value[0])
    

    The common parameter solution is a=2.0 and there are different data lengths for x and y.