I was looking at the following official documentation from statsmodels:
But when I try to run this code on a practice dataset (statsmodels.api already imported as sm)
variance_inflation_factor=sm.stats.outliers_influence.variance_inflation_factor()
vif=pd.DataFrame()
vif['VIF']=[variance_inflation_factor(X_train.values,i) for i in range(X_train.shape[1])]
vif['Predictors']=X_train.columns
I get the error message: module 'statsmodels.stats.api' has no attribute 'outliers_influence
Can anyone tell me what is the appropriate way to get this working?
variance_inflation_factor=sm.stats.outliers_influence.variance_inflation_factor()
does not need to be defined by calling the function with no arguments. Instead, variance_inflation_factor
is a function that takes two inputs.
import pandas as pd
import numpy as np
from statsmodels.stats.outliers_influence import variance_inflation_factor
X_train = pd.DataFrame(np.random.standard_normal((1000,5)), columns=[f"x{i}" for i
in range(5)])
vif=pd.DataFrame()
vif['VIF']=[variance_inflation_factor(X_train.values,i) for i in range(X_train.shape[1])]
vif['Predictors']=X_train.columns
print(vif)
which produces
VIF Predictors
0 1.002882 x0
1 1.004265 x1
2 1.001945 x2
3 1.004227 x3
4 1.003989 x4