I would like to minimize the difference between y measured and y predicted in two different sized datasets, using two equations that share independent variables. I am using gekko as my solver.
I get the Exception error Data arrays must have the same length, and match time discretization in dynamic problems
when my two datasets have a different number of rows. However, even when I set the number of rows to be equal, I get the error duplicate variable declarations
. Therefore, it leads me to believe that my optimization problem is ill-defined to solve two equations that share independent variables.
Variables: v1, v2
Parameters (ds = dataset):
ds1_p1, ds1_p2
ds2_p1, ds2_p2, ds2_p3
Equations:
y1 = v1 * ds1_p1 + v2 * ds1_p2
y2 = v1 * v2 * ds2_p1 + v1 * ds2_p2 + v2 * ds2_p3
Minimize:
((y1_pred - y1_meas) / y1_meas)^2
((y2_pred - y2_meas) / y2_meas)^2
My simplified code for this two-equation example is shown below (my actual code has many more equations, variables, and parameters):
# define GEKKO model
m = GEKKO()
# variables and parameters
variables = {"v1": {"val": 1, "lb": 0, "ub": 2}, "v2": {"val": 3, "lb": 1, "ub": 5}}
# create variable dictionary
v_dic = {
var_name: m.FV(
value=var_val["val"],
lb=var_val["lb"],
ub=var_val["ub"],
name=var_name,
)
for var_name, var_val in variables.items()
}
# set variables to be available for optimizer
for var in v_dic.values():
var.STATUS = 1
parameters = {
"dataset1": {f"p{i}": np.random.random(50) for i in range(1, 3)},
"dataset2": {f"p{i}": np.random.random(100) for i in range(1, 4)},
}
# create parameters dictionary
p_dic = {
ds_name: {
par_name: m.Param(value=par_val, name=par_name)
for par_name, par_val in ds_val.items()
}
for ds_name, ds_val in parameters.items()
}
# v1 = 2, v2 = 4
ym = {
"dataset1": {"ymeas": 2 * p_dic["dataset1"]["p1"] + 4 * p_dic["dataset1"]["p2"]},
"dataset2": {"ymeas": 2 * 4 * p_dic["dataset2"]["p1"] + 2 * p_dic["dataset2"]["p2"] + 4 * p_dic["dataset2"]["p3"]},
}
# create y measured dictionary
ym_dic = {
ds_name: m.Param(value=ds_val["ymeas"], name=f"ymeas_{ds_name}")
for ds_name, ds_val in ym.items()
}
# create y predicted dictionary
yp_dic = {ds_name: m.Var() for ds_name in ym_dic.keys()}
# define the equation for each dataset
m.Equation(
yp_dic["dataset1"]
== v_dic["v1"] * p_dic["dataset1"]["p1"] + v_dic["v2"] * p_dic["dataset1"]["p2"]
)
m.Equation(
yp_dic["dataset2"] == v_dic["v1"] * v_dic["v2"] * p_dic["dataset2"]["p1"] + v_dic["v1"] * p_dic["dataset2"]["p2"] + v_dic["v2"] * p_dic["dataset2"]["p3"]
)
# minimize each equation
for d_s in yp_dic.keys():
m.Minimize(((yp_dic[d_s] - ym_dic[d_s]) / ym_dic[d_s]) ** 2)
m.options.IMODE = 2 # regression mode
m.solve()
# print solution
print("Solution")
for name, var in v_dic.items():
print(f"{name} == {var.value[0]}")
IMODE=2
is built for problems where the model equations are written once and duplicated for each row of data. Use IMODE=3
and use list comprehensions or loops to build separate equations for each data set.
from gekko import GEKKO
m = GEKKO()
x = [1,2,3,4,5]
y = [1,2,3,4,5,6,7]
a = m.FV(); a.STATUS=1
z = [2,4,6,8,10,12,14]
[m.Minimize((z[i]-a*x[i])**2) for i in range(len(x))]
[m.Minimize((z[i]-a*y[i])**2) for i in range(len(y))]
m.options.IMODE=3
m.solve(disp=False)
print(a.value[0])
The common parameter solution is a=2.0
and there are different data lengths for x
and y
.