Search code examples
pythonstatsmodelsglm

How can I get the information of dependent variable from OLS results in Python?


I'm trying to get information from OLS results after for loop regression.

For example,

depvars = ['y1', 'y2', 'y3', ...]
models = [ "~ x1 + x2", "~ x1 + x2 + x3", ...]
results = []
for depvar in depvars:
    for model in models:
        results.append(smf.glm(formula = depvar + model, data= data).fit())

I can get information such as estimates, p-value by results[0].params, results[0].pvalues.

But I also want to get the name of the dependent variable (y1, y2, ...) used in each regression so that I can tell which parameters are for which variable.

For instance, if I run results[0].depvar then I get y1 .

Thank you! :)


Solution

  • It's under model.endog_names, for example:

    import statsmodels.formula.api as smf
    import numpy as np
    import pandas as pd
    
    data = pd.DataFrame(np.random.uniform(0,1,(50,6)),
                       columns=['x1','x2','x3','y1','y2','y3'])
    
    depvars = ['y1', 'y2', 'y3']
    models = [ "~ x1 + x2", "~ x1 + x2 + x3"]
    
    for depvar in depvars:
        for model in models:
            results.append(smf.glm(formula = depvar + model, data= data).fit())
    
    print("dependent:",results[0].model.endog_names)
    print("independent:",results[0].model.exog_names)
    print("coefficients:\n",results[0].params)
    

    Gives you:

    dependent: y1
    independent: ['Intercept', 'x1', 'x2']
    coefficients:
     Intercept    0.468554
    x1           0.258408
    x2          -0.138862
    dtype: float64