Search code examples
pandasdataframeiteration

transform print result of iteration into pandas dataframe


How to create dataframe from print result of iteration of multiple columns ? any reference for this ? Thanks

 for i in range(2):
      test = regression.linear_model.OLS(df[['s'+str(i+1)]],sm.add_constant(df[['benchmark']])).fit()
      print(test.params)
      print(test.tvalues)
      print(test.pvalues)

output:

const        0.018959
benchmark    0.770473
dtype: float64
const        3.586451
benchmark    8.573976
dtype: float64
const        4.329121e-04
benchmark    4.732058e-15
dtype: float64
const        0.018192
benchmark    0.778906
dtype: float64
const        3.180102
benchmark    8.009541
dtype: float64
const        1.736846e-03
benchmark    1.450519e-13
dtype: float64

Solution

  • You can organize them into a pandas dataframe manually like the following (and name the columns whatever you want)

    pd_results= pd.DataFrame({"ols_params": result.params, "ols_tvalues": result.tvalues, "ols_pvalues": result.pvalues})
    

    I'll start with the statsmodel example from here so I have data to work with ( https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html ) and explictly use the linear_model

    import statsmodels.api as sm
    import statsmodels.regression.linear_model as linear_model
    import numpy as np
    # sm dataset
    duncan_prestige = sm.datasets.get_rdataset("Duncan", "carData")
    Y = duncan_prestige.data['income']
    X = duncan_prestige.data['education']
    X = sm.add_constant(X)
    # linear_model OLD
    model = linear_model.OLS(Y,X)
    results = model.fit()
    results.params
    pd_results = pd.DataFrame({"ols_params": results.params, "ols_tvalues": results.tvalues, "ols_pvalues": results.pvalues})
    pd_results
    

    Here's what this looks like in collab: enter image description here