Search code examples
pythonlinear-regressionstatsmodelslogarithmpatsy

Converting a simple regression to a logarithmic scale with patsy, statsmodels


I am following an online econometrics course and learning stats models while I go.

I know from the instructor that this regression will have a better fit on a logarithmic scale, but I don't know how or where to convert my data / formula.

I am using Python, Pandas, Statsmodels and Patsy

Here is where I converted the data to dmatrices:

    y, X = dmatrices('PRICE ~ QUANTITY', data=df, return_type='dataframe')

Here is where I ran the regression in statsmodels:

    mod = sm.OLS(y, X)      # Describe model

    res = mod.fit()         # Fit Model

    print(res.summary())    # Summarize model

I get a very low r-squared, but the model does run. I'm just trying to figure out how to convert to a log scale. The example given on the course, he converted both the X and Y axes to log scales

EDIT: I got it to work using this:

    df2['Quantity'] = np.log(df['QUANTITY'])
    df2['Price'] = np.log(df['PRICE'])

Is there a way to get that done in 1 line of code, or even a loop if I needed to do it to a few more variables in another problem?


Solution

  • a simple loop, also changing the name to include "log", could be

    columns = ['QUANTITY', 'PRICE', 'aaa']
    for col in columns:
        df2["log-" + col] = np.log(df[col])
    

    It is also possible to use np.log inside the formula, but statsmodels does not provide more support in that case, and it would compute the log each time the regression or formula is run instead of computing it once for the relevant columns of the dataframe.