Search code examples
pythonmachine-learningscikit-learnpredictgekko

Using the predict() methods of fitted models with gekko


Many model-fitting Python packages have a predict() method, which outputs a prediction of the fitted model given observations of the predictor(s).

Question: How would I use these predict() methods to predict a single value when the observation is a variable in a gekko model?

Below is a very simple reproducible example:

Note: The actual model that I am fitting is a B-spline using statsmodels.gam.smooth_basis.BSplines and statsmodels.gam.generalized_additive_model.GLMGam. However, I am hoping that this simple example with sklearn.linear_model.LinearRegression will translate to more complex model classes from other packages.

from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
from gekko import GEKKO

# create example data
x = np.arange(100)[:, np.newaxis]
y = np.arange(100) * 2 + 10

plt.plot(x, y)  # plot x vs y data
# plt.show()

model = LinearRegression()  # instantiate linear model
model.fit(x, y)  # fit model

x_predict = np.arange(100, 200)[:, np.newaxis]  # create array of predictor observations
y_predict = model.predict(x_predict)  # use model to make prediction

plt.plot(x_predict, y_predict)  # plot prediction
# plt.show()

m = GEKKO()  # instantiate gekko model

x2 = m.FV()  # instantiate free variable
x2.STATUS = 1  # make variable available for solver

y2 = 50  # true value

# place x2 variable in numpy array to adhere to predict()'s argument requirements
x2_arr = np.array(x2).reshape(1, -1)

# minimize squared error between the true value and the model's prediction
m.Minimize((y2 - model.predict(x2_arr)) ** 2)

m.options.IMODE = 3
# solve for x2
m.solve(disp=True)

print(f"x2 = {x2.value[0]:3f}")

I get the following sequential errors:

TypeError: float() argument must be a string or a real number, not 'GK_FV'

ValueError: setting an array element with a sequence.

My first thought is that I would have to create a wrapper class around the gekko.gk_parameter.GK_FV class to modify the float() method, but that's where my knowledge and skills end.


Solution

  • The Gekko ML functions can import sklearn, gpflow, and tensorflow models. It requires the models to be fit and then the parameters and equations are imported into the gekko model.

    Gekko_Model = ML.Gekko_LinearRegression(model,Xtrain,RMSE,Gekko_Model)
    
    • Import a trained linear regression model from sklearn. This model calculates uncertainty through the delta method for regression methods.
    • model: trained model from sklearn as a Ridge Regression or Linear Regression model.
    • Xtrain: input training set, needed to calculate the prediction interval.
    • RMSE: root mean squared error calculated during training for the entire data set. This is used to calculate the prediction interval.
    • Gekko_Model: Gekko model (created by GEKKO()) that is appended with the new Linear Regression model.