Search code examples
pythonstatsmodelsgmm

Issue with using statsmodels.sandbox.regression.gmm.GMM


I wanna estimate interest rate process using gmm. enter image description here

enter image description here

So, I referenced a this code. https://github.com/josef-pkt/misc/blob/master/notebooks/ex_gmm_gamma.ipynb

and following is my code.

import numpy as np
import pandas as pd
from statsmodels.sandbox.regression.gmm import GMM

cd = np.array([1.5, 1.5, 1.7, 2.2, 2.0, 1.8, 1.8, 2.2, 1.9, 1.6, 1.8, 2.2, 2.0, 1.5, 1.1, 1.5, 1.4, 1.7, 1.42, 1.9])
dcd = np.array([0, 0.2 ,0.5, -0.2, -0.2, 0, 0.4, -0.3, -0.3, 0.2, 0.4, -0.2, -0.5, -0.4, 0.4, -0.1, 0.3, -0.28, 0.48, 0.2])
inst = np.column_stack((np.ones(len(cd)), cd))

class gmm(GMM):
    def momcond(self, params):
        p0, p1, p2, p3 = params
        endog = self.endog
        exog = self.exog
        inst = self.instrument   

        error1 = endog - p0 - p1 * exog
        error2 = (endog - p0 - p1 * exog) ** 2 - p2 * (exog ** (2 * p3)) / 12
        error3 = (endog - p0 - p1 * exog) * inst[:,0]
        error4 = ((endog - p0 - p1 * exog) ** 2 - p2 * (exog ** (2 * p3)) / 12) * inst[:,1]
        g = np.column_stack((error1, error2, error3, error4))
        return g


beta0 = np.array([0.1, 0.1, 0.01, 1])

gmm(endog = dcd, exog = cd, instrument = inst, k_moms=4, k_params=4).fit(beta0)

But, it rises an error like this.

ValueError: shapes (80,) and (4,4) not aligned: 80 (dim 0) != 4 (dim 0)

Could you please solve this problem.


Solution

  • The shape problem is because exog is a column array (vector) and the indexed instrument is 1-D which broadcasts to the 80 columns. I added a squeeze to exog, so that exog is also 1-D

    The second problem is that there is a typo in the index of the instrument for moment condition 3, which should use
    error3 = (endog - p0 - p1 * exog) * inst[:,1]
    After fixing the shape problem, the fit raises a LinalgError because error1 and error3 were the same.

    It works for me after making these two changes, but I don't know whether the estimated parameters make sense in the application.

    cd = np.array([1.5, 1.5, 1.7, 2.2, 2.0, 1.8, 1.8, 2.2, 1.9, 1.6, 1.8, 2.2, 2.0, 1.5, 1.1, 1.5, 1.4, 1.7, 1.42, 1.9])
    dcd = np.array([0, 0.2 ,0.5, -0.2, -0.2, 0, 0.4, -0.3, -0.3, 0.2, 0.4, -0.2, -0.5, -0.4, 0.4, -0.1, 0.3, -0.28, 0.48, 0.2])
    inst = np.column_stack((np.ones(len(cd)), cd))
    
    class gmm(GMM):
        def momcond(self, params):
            p0, p1, p2, p3 = params
            endog = self.endog
            exog = self.exog.squeeze()
            inst = self.instrument   
    
            error1 = endog - p0 - p1 * exog
            error2 = (endog - p0 - p1 * exog) ** 2 - p2 * (exog ** (2 * p3)) / 12
            error3 = (endog - p0 - p1 * exog) * inst[:,1]
            error4 = ((endog - p0 - p1 * exog) ** 2 - p2 * (exog ** (2 * p3)) / 12) * inst[:,1]
            g = np.column_stack((error1, error2, error3, error4))
            return g
    
    
    beta0 = np.array([0.1, 0.1, 0.01, 1])
    res = gmm(endog = dcd, exog = cd, instrument = inst, k_moms=4, k_params=4).fit(beta0)
    

    There is a bug in GMM for summary which is based on an incorrect and too short list of parameter names. We can override the parameter names, then summary works

    res.model.exog_names[:] = 'p0 p1 p2 p3'.split()
    print(res.summary())
    
    
    
    
                                    gmm Results                                  
    ==============================================================================
    Dep. Variable:                      y   Hansen J:                    1.487e-10
    Model:                            gmm   Prob (Hansen J):                   nan
    Method:                           GMM                                         
    Date:                Wed, 14 Mar 2018                                         
    Time:                        09:38:38                                         
    No. Observations:                  20                                         
    ==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
    ------------------------------------------------------------------------------
    p0             0.9890      0.243      4.078      0.000       0.514       1.464
    p1            -0.5524      0.129     -4.281      0.000      -0.805      -0.299
    p2             1.2224      0.940      1.300      0.193      -0.620       3.065
    p3            -0.3376      0.641     -0.527      0.598      -1.593       0.918
    ==============================================================================
    

    Extra

    In the corrected version the constant in the instrument is not used anymore. So it could be removed, or the moment conditions could be vectorized in instruments as in the following. Note, I convert endog to 2-d column array, so it matches the shape of exog and instruments.

    class gmm(GMM):
        def momcond(self, params):
            p0, p1, p2, p3 = params
            endog = self.endog[:, None]
            exog = self.exog
            inst = self.instrument   
    
            error3 = (endog - p0 - p1 * exog) * inst
            error4 = ((endog - p0 - p1 * exog) ** 2 - p2 * (exog ** (2 * p3)) / 12) * inst
            g = np.column_stack((error3, error4))
            return g
    
    
    beta0 = np.array([0.1, 0.1, 0.01, 1])
    res = gmm(endog = dcd, exog = cd, instrument = inst, k_moms=4, k_params=4).fit(beta0)
    res.model.exog_names[:] = 'p0 p1 p2 p3'.split()
    print(res.summary())
    

    Debugging

    We can check that the user provided moment conditions have the correct shape but just creating the model instance and calling momcond

    mod = gmm(endog = dcd, exog = cd, instrument = inst, k_moms=4, k_params=4)
    mod.momcond(beta0).shape