Search code examples
pythonpython-3.xpandaslinear-regressionstatsmodels

How to Create Variables from OLS Regression Results?


import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm 
import matplotlib.pyplot as plt

d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Madrid','Lisbon','Madrid','London','Tokyo','London','Tokyo'], 
     'Card': ['Visa','Visa','Visa','Master Card','Bitcoin','Master Card','Bitcoin','Visa','Master Card','Visa','Bitcoin'],
     'Client Number':[1,2,3,4,5,6,7,8,9,10,11],
     }

d = pd.DataFrame(data=d).set_index('Client Number')

df = pd.get_dummies(d,prefix='', prefix_sep='')


X = df[['Lisbon','London','Madrid','New York','Tokyo','Bitcoin','Master Card','Visa','No','Yes']]
Y = df['Total']

X1 = sm.add_constant(X)
reg = sm.OLS(Y, X1).fit()

reg.summary()

enter image description here

I want to import the coef of each variable in order to apply the model to new data. How do I do that ?


Solution

  • reg.params contains the parameter estimates. Other quantities presented in the summary are available in reg.bse (standard errors), reg.tvalues (t-statistics) and reg.pvalues (P-values).

    The full set of available properties can be seen in the documentation:

    https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.html

    If you want to apply the same parameters to a different dataset, the simplest method is to construct a new OLS model with the new data, e.g.,

    mod = OLS(y_new, x_new)
    

    and then use the predict method,

    mod.predict(reg.params)
    

    where res.params are from your original fit. Note that is must be the case that x_new has the same variables in the same location as in the original regression.