Search code examples
pythongroup-byregressioncoefficients

Regression by group and display output in python


Hi ~ I want to ask a quick question related to regression analysis in python. I have the following dataframe:

group      Y        X
 1         9        3
 1         5        4
 1         3        1
 2         1        6
 2         2        4
 2         3        9

Y is dependent and X is independent variable. I want to run regression Y=a + bx by group and output another dataframe that contains the coefficients, t-stats and R-square. So, the dataframe should be like:

group   coefficient   t-stats    intercept    r-square
  1        0.25         1.4        4.3         0.43
  2        0.30         2.4        3.6         0.49
 ...        ...         ...        ...         ...

Can someone help ? Many thanks in advance for your help.


Solution

  • I will show some mockup so you can build the rest. It is mainly pulling up a your custom regression function and passing the dataframe in using apply.

    let me know what you think.

    import pandas as pd
    import statsmodels.api as sm 
    
    def GroupRegress(data, yvar, xvars):
        Y = data[yvar]
        X = data[xvars]
        X['intercept'] = 1.
        result = sm.OLS(Y, X).fit()
        return result.params
    
    import pandas as pd
    df = pd.DataFrame({'group': [1,1,1,2,2,2], 
                       'Y': [9,5,3,1,2,3],
                      'X': [3,4,1,6,4,9]
                      })
    df
    
    
    df.groupby('group').apply(GroupRegress, 'Y', ['X'])
    

    Result below:

    X   intercept
    group       
    1   1.000000    3.0
    2   0.236842    0.5