Search code examples
pythonpandasregressionstatsmodelsstandard-deviation

Find RSME and Standard Deviation of a StatsModels Multiple Regression


I currently have a multiple regression that generates an OLS summary based on the life expectancy and the variables that impact it, however that does not include RMSE or standard deviation. Does statsmodels have a rsme library, and is there a way to calculate standard deviation from my code?

I have found a previous example of this problem: regression model statsmodel python , and I read the statsmodels info page: https://www.statsmodels.org/stable/generated/statsmodels.tools.eval_measures.rmse.html and testing I am still not able to get this problem resolved.

import pandas as pd
import openpyxl
import statsmodels.formula.api as smf
import statsmodels.formula.api as ols

df = pd.read_excel(C:/Users/File1.xlsx, sheet_name = 'States')

dfME = df[(df[State] == "Maine")]

pd.set_option('display.max_columns', None)

dfME.head()

model = smf.ols(Life Expectancy ~ Race + Age + Weight + C(Pets), data = dfME) 
modelfit = model.fit()
modelfit.summary

Solution

  • It sounds like you mean the Standard Deviation of the Residuals which is calculated using the Root Mean Squared Error. This gives you a measure of how spread out the data points are from the line of best fit. It's often used as a measure of Prediction Error.

    There is a lot of information left off the summary in Statsmodels. Fortunately, Statsmodels provides us with alternatives. You can find a list of available properties and methods here: Regression Results

    Let's use the variable assignment modelfit from your code. To find the Mean Squared Error of the Residuals, use the mse_resid method in Statsmodels found in the link. To find the RMSE (root mean squared error) of the residuals take the square root of the mean squared error using the square root function in Numpy, sqrt.

    Thus the Root Mean Squared Error of the Residuals can be found using this code:

    rmse_residuals = np.sqrt(modelfit.mse_resid)