I am trying to execute the below code:
heart_df = pd.read_csv(r"location")
X = heart_df.iloc[:, :-1].values
y = heart_df.iloc[:, 11].values
new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values() #this is line 17
cat_cols = new_df.copy()
and getting IndexError like:
File "***location***", line 17, in <module>
new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values()
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
As far as I know this IndexError comes when we use float numbers as indices but don't understand why it is coming in this case.
Here, by creating new_df and then cat_cols, I want to separate the categorical columns to apply OneHotEncoding at a later stage.
The dataset is here: https://www.kaggle.com/fedesoriano/heart-failure-prediction.
The error is coming from:
X = heart_df.iloc[:, :-1].values
The .values part converts the data frame to a numpy array and certain columns in X are not compatible with numpy array.
So we can write the same as:
X = heart_df.iloc[:, :-1]
new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]]