Search code examples
pythonpandasdataframesimulationgekko

Plotting Gekko Model With Pandas Datafram


I'm trying to compare a theoretical model to historical data. My plan is to input the real world data into the equation from a few pandas data frames with Gekko(skip to #EQUATION#):

import pandas as pd
import pylab as plt
import numpy as np
from gekko import GEKKO

#DATA#
#oil supply
data = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&s=mcrfpus2&f=m')
s=data[4]
sply = s.loc[s['Year'] >= 1984]
supplyvalues = sply.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
val2 = supplyvalues.dropna()
protoval3 = supplyvalues.shift(1)
val3=protoval3.dropna()
#price of oil
p = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=EMA_EPM0_PTG_NUS_DPG&f=M')
Op=p[4]
prc = Op.loc[Op['Year'] >= 1984]
pricevalues = prc.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
val1 = pricevalues.dropna()

#EQUATION#
m=GEKKO()
tm = np.linspace(0,10,50)
t=m.Param(value = tm)
m.time = tm
m.options.IMODE=4

#forming equation
a,b,c = m.Array(m.Param,3)
a.value = val1.values
b.value = val2.values
c.value = val3.values
x = m.Var()
m.Equation(a == c-a*b)
m.solve(disp=False)

Issue is that every time I try it, I get this error:

Exception: Data arrays must have the same length, and match time discretization in dynamic problems

I checked the shape of the data frames and they are equal, so I'm not sure what's going wrong. Kind of annoying since I feel like I'm really close. I would really appreciate any help you can give!

p.s: I'm still a beginner with programming so I'd also really appreciate if you can describe what your new lines of code do(if you have any). This is totally optional!


Solution

  • The parameters val1, val2, and val3 are 38x12 matrices.

          Jan    Feb    Mar    Apr    May  ...    Aug    Sep    Oct    Nov    Dec
    1   0.906  0.902  0.907  0.929  0.934  ...  0.892  0.897  0.905  0.899  0.880
    3   0.846  0.836  0.871  0.924  0.944  ...  0.940  0.919  0.908  0.917  0.919
    4   0.893  0.805  0.654  0.591  0.638  ...  0.555  0.562  0.532  0.532  0.542
    5   0.597  0.621  0.627  0.649  0.663  ...  0.716  0.705  0.697  0.694  0.674
    6   0.649  0.633  0.625  0.660  0.684  ...  0.718  0.700  0.680  0.676  0.661
    

    A few modifications are needed to give a successful solution.

    1. Select just one month or align the annual data to the time discretization points.
    #forming equation
    a,b,c = m.Array(m.Param,3)
    a.value = val1['Jan'].values
    b.value = val2['Jan'].values
    c.value = val3['Jan'].values
    

    If multiply months of simulation are needed then place each parameter into a vector of values.

    1. Align the time points with the inputs for val1, val2, and val3.
    tm = np.linspace(0,10,38)
    
    1. Check the equations.

    Each month has 38 time points so the time discretization also needs this many time points. The equation doesn't look complete - a dynamic simulation typically has differential equations such as x.dt()==-x + a - b + c.

    1. The import needed the lxml package. Installed with:
    pip install lxml
    

    Here is a complete script that solves successfully, but isn't the intended simulation. Perhaps it is a good starting point for developing the simulation problem.

    import pandas as pd
    import pylab as plt
    import numpy as np
    from gekko import GEKKO
    
    #DATA#
    #oil supply
    data = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=pet&s=mcrfpus2&f=m')
    s=data[4]
    sply = s.loc[s['Year'] >= 1984]
    supplyvalues = sply.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
    val2 = supplyvalues.dropna()
    protoval3 = supplyvalues.shift(1)
    val3=protoval3.dropna()
    #price of oil
    p = pd.read_html('https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=EMA_EPM0_PTG_NUS_DPG&f=M')
    Op=p[4]
    prc = Op.loc[Op['Year'] >= 1984]
    pricevalues = prc.filter(items = ['Jan', 'Feb', 'Mar', 'Apr', 'May','Jun','Jul','Aug','Sep', 'Oct', 'Nov','Dec'])
    val1 = pricevalues.dropna()
    
    print(np.shape(val1.values))
    
    #EQUATION#
    m=GEKKO(remote=False)
    tm = np.linspace(0,10,38)
    print(len(tm))
    t=m.Param(value = tm)
    m.time = tm
    m.options.IMODE=4
    
    #forming equation
    a,b,c = m.Array(m.Param,3)
    a.value = val1['Jan'].values
    b.value = val2['Jan'].values
    c.value = val3['Jan'].values
    x = m.Var()
    m.Equation(x == c-a*b)
    m.solve(disp=True)