Hi guys I'm on my way with ISLP and I'm learning alot. I'm doing an exercise (cap 3, 9, i) that asks to see if there is a relation between the predictors and the model using anova_lm. I don' t get the point to do such thing instead of an F and anyway with a full model one would get the point. Anyway I'm wondering if I could loop through the columns and make for each loop a comparison between the null model and the fitted model with one predictor.
import numpy as np
import pandas as pd
import statsmodels.api as sm
from ISLP.models import ModelSpec as MS
from statsmodels.stats.anova import anova_lm
df = pd.DataFrame('df.csv', na_values=['?']) # na_values= change any '?' in nan
y= df['response']```
# null model (only the intercept).
Xinter = pd.DataFrame({'intercept': np.ones(397, dtype= 'float')}) # (df has 397 rows)
fit_inter = sm.OLS(y , Xinter)
ris_inter = fit_inter.fit()
# subset of columns of my df
df_colonne_rimaste = df.columns.drop(['response']))
#iteration through df_colonne_rimaste
for i in df_colonne_rimaste:
print(i) # see the column I'm working with
X = MS(i).fit_transform(df) # construct the model matrix
modelx = sm.OLS(y, X, missing='drop') # specify the model
resultx = modelx.fit() # estimate parameters
print('Confronto tra modello nullo e modello con ', i, 'risulta: \n {}').format(anova_lm(ris_inter, resultx))
# I get this meaage:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I tried other stuff I can'tremember anymore. I tried hard.
Where is the problem?
Thanks alot
Problem solved. As it seems when dropping rows the index seems to not reload. So the index of X and y couldn' t match
As computations with statsmodels and pandas actually seem to be in some way index based, resetting the index on X and y (.reset_index()) did the job. Thanks alot