Search code examples
pythonnumpymachine-learningprobability

How do I fix an Index Error in Numpy for use in a model for machine learning?


I was trying to predict the probability of a person to be using a certain drug. One of the vital predictions is showing an IndexError.

This is for a prediction. I've used this structure with no problems on different predictions with no problems.

import numpy as np
import pandas as pd
import sklearn.ensemble as skle
drug = pd.read_sas('C:/Users/hamee/Downloads/DUQ_I.xpt')
mod = skle.RandomForestClassifier()
fitmod = mod.fit(drug2[["DUQ200", "DUQ240", "DUQ250", "DUQ290", "DUQ330", "DUQ370"]], drug2["DUQ240"])
Pred = fitmod.predict_proba(drug2[["DUQ200", "DUQ240", "DUQ250", "DUQ290", "DUQ330", "DUQ370"]])
Brier = np.mean((Pred[:,1]-drug2["DUQ290"]**2))

I expected an output of a decimal or a large number, The output was :

IndexError                                Traceback (most recent call last)
<ipython-input-19-90c24bde1c32> in <module>
----> 1 Brier = np.mean((Pred[:,1]-drug2["DUQ290"]**2))

IndexError: index 1 is out of bounds for axis 1 with size 1

Solution

  • Assuming everything up to your Pred is working correctly

    Your Pred is a one dimensional array, try Pred.shape, it should return a tuple of (length, ). Hence you only need to do

    Brier = np.mean((Pred-drug2["DUQ290"]**2))
    

    Update: since your Pred.shape is (539, 1), it should be:

    Brier = np.mean((Pred[:,0]-drug2["DUQ290"]**2))
    

    as all list indices starts from 0.