python pandas dataframe simulation gekko

Plotting Gekko Model With Pandas Datafram

I'm trying to compare a theoretical model to historical data. My plan is to input the real world data into the equation from a few pandas data frames with Gekko(skip to #EQUATION#):

import pandas as pd
import pylab as plt
import numpy as np
from gekko import GEKKO

#DATA#
#oil supply
data = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&s=mcrfpus2&f=m')
s=data[4]
sply = s.loc[s['Year'] >= 1984]
supplyvalues = sply.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
val2 = supplyvalues.dropna()
protoval3 = supplyvalues.shift(1)
val3=protoval3.dropna()
#price of oil
p = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=EMA_EPM0_PTG_NUS_DPG&f=M')
Op=p[4]
prc = Op.loc[Op['Year'] >= 1984]
pricevalues = prc.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
val1 = pricevalues.dropna()

#EQUATION#
m=GEKKO()
tm = np.linspace(0,10,50)
t=m.Param(value = tm)
m.time = tm
m.options.IMODE=4

#forming equation
a,b,c = m.Array(m.Param,3)
a.value = val1.values
b.value = val2.values
c.value = val3.values
x = m.Var()
m.Equation(a == c-a*b)
m.solve(disp=False)

Issue is that every time I try it, I get this error:

Exception: Data arrays must have the same length, and match time discretization in dynamic problems

I checked the shape of the data frames and they are equal, so I'm not sure what's going wrong. Kind of annoying since I feel like I'm really close. I would really appreciate any help you can give!

p.s: I'm still a beginner with programming so I'd also really appreciate if you can describe what your new lines of code do(if you have any). This is totally optional!

Solution

The parameters val1, val2, and val3 are 38x12 matrices.

      Jan    Feb    Mar    Apr    May  ...    Aug    Sep    Oct    Nov    Dec
1   0.906  0.902  0.907  0.929  0.934  ...  0.892  0.897  0.905  0.899  0.880
3   0.846  0.836  0.871  0.924  0.944  ...  0.940  0.919  0.908  0.917  0.919
4   0.893  0.805  0.654  0.591  0.638  ...  0.555  0.562  0.532  0.532  0.542
5   0.597  0.621  0.627  0.649  0.663  ...  0.716  0.705  0.697  0.694  0.674
6   0.649  0.633  0.625  0.660  0.684  ...  0.718  0.700  0.680  0.676  0.661

A few modifications are needed to give a successful solution.

Select just one month or align the annual data to the time discretization points.

#forming equation
a,b,c = m.Array(m.Param,3)
a.value = val1['Jan'].values
b.value = val2['Jan'].values
c.value = val3['Jan'].values

If multiply months of simulation are needed then place each parameter into a vector of values.

Align the time points with the inputs for val1, val2, and val3.

tm = np.linspace(0,10,38)

Check the equations.

Each month has 38 time points so the time discretization also needs this many time points. The equation doesn't look complete - a dynamic simulation typically has differential equations such as x.dt()==-x + a - b + c.

The import needed the lxml package. Installed with:

pip install lxml

Here is a complete script that solves successfully, but isn't the intended simulation. Perhaps it is a good starting point for developing the simulation problem.

import pandas as pd
import pylab as plt
import numpy as np
from gekko import GEKKO

#DATA#
#oil supply
data = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&s=mcrfpus2&f=m')
s=data[4]
sply = s.loc[s['Year'] >= 1984]
supplyvalues = sply.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
val2 = supplyvalues.dropna()
protoval3 = supplyvalues.shift(1)
val3=protoval3.dropna()
#price of oil
p = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=EMA_EPM0_PTG_NUS_DPG&f=M')
Op=p[4]
prc = Op.loc[Op['Year'] >= 1984]
pricevalues = prc.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
val1 = pricevalues.dropna()

print(np.shape(val1.values))

#EQUATION#
m=GEKKO(remote=False)
tm = np.linspace(0,10,38)
print(len(tm))
t=m.Param(value = tm)
m.time = tm
m.options.IMODE=4

#forming equation
a,b,c = m.Array(m.Param,3)
a.value = val1['Jan'].values
b.value = val2['Jan'].values
c.value = val3['Jan'].values
x = m.Var()
m.Equation(x == c-a*b)
m.solve(disp=True)