Search code examples
pythonstatisticsstatsmodels

How to calculate a prediction interval for a fitted statsmodels OLS model?


I was given the following model:

model

Thus, the expected change in output of increasing X by one unit is given by:

marginal effect

Let's say I assume a value of 40 for X.

How can I calculate a 95% confidence interval for the effect of increasing X in 0.25 units?


What follows is a replicable example.

# Generate data
import pandas as pd
from scipy import stats as st
df = pd.DataFrame({'const':1,'X':st.norm(loc=40, scale=5).rvs(1000)})
df['X_sq'] = df['X'].pow(2)
df['y'] = 1200 + df['X'] + df['X_sq'] + st.norm().rvs(1000)
df = df[['y','const','X','X_sq']]

# Declare and fit model
y = df['y']
X = df[['const','X','X_sq']]
m1 = OLS(endog=y, exog=X).fit()

# Assume a value for `Xi`
x = 40

# Predicted marginal effect of increasing `Xi` in ONE UNIT
Mg = m1.params['X'] + (2 * m1.params['X_sq'] * x)

Great, so the expected change in y followed by increasing X from 40 to 41 is equal to Mg.

How can I calculate a 95% confidence interval for a marginal change in X of 0.25 units?

As a hint, I think this can be done with m1.t_test()


Solution

  • t_test works because the statistic Mg is linear in parameters. Note, Mg is the derivative, i.e. a marginal change at point x. To get a discrete change, we can multiply MG by dx = x1 - x to get a linear approximation or use the discrete change in y which is also linear in the parameters.

    We can use a restriction defined by either a string or an explicit constraint matrix.

    I use old fashioned string interpolation, and I added a seed before the simulation to get replicable results.

    Marginal change given by derivative Mg

    np.random.seed(987125348)

    "X + %f * X_sq" % (2 * x)
    'X + 80.000000 * X_sq'
    
    m1.t_test("X + %f * X_sq" % (2 * x))
    <class 'statsmodels.stats.contrast.ContrastResults'>
                                 Test for Constraints                             
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    c0            81.0089      0.007   1.24e+04      0.000      80.996      81.022
    ==============================================================================
    
    Mg
    81.00891173785777
    

    with explicit restriction matrix:

    m1.t_test([0, 1, 2 * x])
    <class 'statsmodels.stats.contrast.ContrastResults'>
                                 Test for Constraints                             
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    c0            81.0089      0.007   1.24e+04      0.000      80.996      81.022
    ==============================================================================
    

    with t-test that value is 80

    m1.t_test("X + %f * X_sq = 80" % (2 * x))
    <class 'statsmodels.stats.contrast.ContrastResults'>
                                 Test for Constraints                             
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    c0            81.0089      0.007    154.100      0.000      80.996      81.022
    ==============================================================================
    

    Confidence interval for 0.25 change

    What is the effect of increasing x from 40 to 40.25?

    The change in the predicted value can be written as a function that is linear in parameters, so t_test can still be used for this. ​

    x0 = 40
    x1 = 40.25
    m1.predict([0, x1, x1**2]) - m1.predict([0, x0, x0**2])
    array([20.314731])
    

    discrete change

    m1.t_test([0, (x1 - x0), (x1**2 - x0**2)])
    <class 'statsmodels.stats.contrast.ContrastResults'>
                                 Test for Constraints                             
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    c0            20.3147      0.002   1.24e+04      0.000      20.312      20.318
    ==============================================================================
    

    linear approximation using derivative at x0 = 40

    dx = x1 - x0
    m1.t_test([0, 1 * dx, 2 * x0 * dx])
    <class 'statsmodels.stats.contrast.ContrastResults'>
                                 Test for Constraints                             
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    c0            20.2522      0.002   1.24e+04      0.000      20.249      20.255
    ==============================================================================