Search code examples
pythonstatsmodels

creating residual plots using statsmodels


I am trying to create residual plots using the statsmodels.graphics.regressionplots.plot_regress_exog but I am getting the error that the independent Var is not found. The exact error is as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-43dae6c58d5d> in <module>
     28 
     29 #produce regression plots
---> 30 fig = sm.graphics.plot_regress_exog(model,'Pow', fig=fig)

~\Anaconda3\lib\site-packages\statsmodels\graphics\regressionplots.py in plot_regress_exog(results, exog_idx, fig)
    218     fig = utils.create_mpl_fig(fig)
    219 
--> 220     exog_name, exog_idx = utils.maybe_name_or_idx(exog_idx, results.model)
    221     results = maybe_unwrap_results(results)
    222 

~\Anaconda3\lib\site-packages\statsmodels\graphics\utils.py in maybe_name_or_idx(idx, model)
    110     else: # assume we've got a string variable
    111         exog_name = idx
--> 112         exog_idx = model.exog_names.index(idx)
    113 
    114     return exog_name, exog_idx

ValueError: 'Pow' is not in list

Could you please help me figure out the problem?

Here is my code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols


wt_160 = [575, 542, 530, 539, 570]
wt_180 = [565, 593, 590, 579, 610]
wt_200 = [600, 651, 610, 637, 629]
wt_220 = [725, 700, 715, 685, 710]
pwr = [160, 180, 200, 220]


e_rates = wt_160 + wt_180+ wt_200 +wt_220
pow_lvl = (['160 W']*len(wt_160)) + (['180 W']*len(wt_180)) + (['200 W']*len(wt_200)) + (['220 W']*len(wt_220))

df = pd.DataFrame({'Pow': pow_lvl, 'E_Rates':e_rates})


model = ols('E_Rates ~ C(Pow)', df).fit()
anova_result = anova_lm(model, type=2)
print(model.summary())

import statsmodels.api as sm

fig = plt.figure(figsize=(12,8))

#produce regression plots
fig = sm.graphics.plot_regress_exog(model,'Pow', fig=fig)

Solution

  • Notice that Pow is a categorical predictor, thus when accessing it you should consider it's category level. For example,

    import statsmodels.api as sm
    
    fig = plt.figure(figsize=(12,8))
    
    #produce regression plots
    fig = sm.graphics.plot_regress_exog(model,'C(Pow)[T.180 W]', fig=fig)
    

    will produce

    enter image description here

    to acess your predictor variables, you can access the params attribue of model

    model.params
    >>> 
    Intercept          551.2
    C(Pow)[T.180 W]     36.2
    C(Pow)[T.200 W]     74.2
    C(Pow)[T.220 W]    155.8