Search code examples
python-3.xone-hot-encodingindex-error

Getting unxpected IndexError when creating a dataframe


I am trying to execute the below code:

heart_df = pd.read_csv(r"location")
X = heart_df.iloc[:, :-1].values
y = heart_df.iloc[:, 11].values

new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values() #this is line 17

cat_cols = new_df.copy()

and getting IndexError like:

  File "***location***", line 17, in <module>
  new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values()
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

As far as I know this IndexError comes when we use float numbers as indices but don't understand why it is coming in this case.

Here, by creating new_df and then cat_cols, I want to separate the categorical columns to apply OneHotEncoding at a later stage.

The dataset is here: https://www.kaggle.com/fedesoriano/heart-failure-prediction.


Solution

  • The error is coming from:

    X = heart_df.iloc[:, :-1].values
    

    The .values part converts the data frame to a numpy array and certain columns in X are not compatible with numpy array.

    So we can write the same as:

    X = heart_df.iloc[:, :-1]
    new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]]