How to create dataframe from print result of iteration of multiple columns ? any reference for this ? Thanks
for i in range(2):
test = regression.linear_model.OLS(df[['s'+str(i+1)]],sm.add_constant(df[['benchmark']])).fit()
print(test.params)
print(test.tvalues)
print(test.pvalues)
output:
const 0.018959
benchmark 0.770473
dtype: float64
const 3.586451
benchmark 8.573976
dtype: float64
const 4.329121e-04
benchmark 4.732058e-15
dtype: float64
const 0.018192
benchmark 0.778906
dtype: float64
const 3.180102
benchmark 8.009541
dtype: float64
const 1.736846e-03
benchmark 1.450519e-13
dtype: float64
You can organize them into a pandas dataframe manually like the following (and name the columns whatever you want)
pd_results= pd.DataFrame({"ols_params": result.params, "ols_tvalues": result.tvalues, "ols_pvalues": result.pvalues})
I'll start with the statsmodel example from here so I have data to work with ( https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html ) and explictly use the linear_model
import statsmodels.api as sm
import statsmodels.regression.linear_model as linear_model
import numpy as np
# sm dataset
duncan_prestige = sm.datasets.get_rdataset("Duncan", "carData")
Y = duncan_prestige.data['income']
X = duncan_prestige.data['education']
X = sm.add_constant(X)
# linear_model OLD
model = linear_model.OLS(Y,X)
results = model.fit()
results.params
pd_results = pd.DataFrame({"ols_params": results.params, "ols_tvalues": results.tvalues, "ols_pvalues": results.pvalues})
pd_results