Search code examples
python-3.xfunctionclasskeyerror

KeyError: 0 - Function works initially, but returns error when called on other data


I have a function:

def create_variables(name, probabilities, labels):
    print('function called')
    model = Metrics(probabilities, labels)
    prec_curve = model.precision_curve()
    kappa_curve = model.kappa_curve()
    tpr_curve = model.tpr_curve()
    fpr_curve = model.fpr_curve()
    pr_auc = auc(tpr_curve, prec_curve)
    roc_auc = auc(fpr_curve, tpr_curve)
    auk = auc(fpr_curve, kappa_curve)
    
    return [name, prec_curve, kappa_curve, tpr_curve, fpr_curve, pr_auc, roc_auc, auk]

I have the following variables:

svm = pd.read_csv('SVM.csv')

svm_prob_1 = svm.probability[svm.fold_number == 1]
svm_prob_2 = svm.probability[svm.fold_number == 2]

svm_label_1 = svm.true_label[svm.fold_number == 1]
svm_label_2 = svm.true_label[svm.fold_number == 2]

I want to execute the following lines:

svm1 = create_variables('svm_fold1', svm_prob_1, svm_label_1)
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)

Python works as expected for svm1. However, when it starts processing svm2, I receive the following error:

svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
function called
Traceback (most recent call last):

  File "<ipython-input-742-702cfac4d100>", line 1, in <module>
    svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)

  File "<ipython-input-741-b8b5a84f0298>", line 6, in create_variables
    prec_curve = model.precision_curve()

  File "<ipython-input-734-dd9c309be961>", line 59, in precision_curve
    self.tp, self.tn, self.fp, self.fn = self.confusion_matrix(self.preds)

  File "<ipython-input-734-dd9c309be961>", line 72, in confusion_matrix
    if pred == self.labels[i]:

  File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)

  File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))

  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value

  File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 992, in pandas._libs.hashtable.Int64HashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item

KeyError: 0

svm_prob_1 and svm_prob_2 are both of the same shape and contain non-zero values. svm_label_2 contains 0's and 1's and has the same length as svm_prob_2.

Furthermore, the error seems to be in svm_label_1. After changing this variable, the following line does work:

svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_1

Based on the code below, there seems to be no difference between svm_label_1 and svm_label_2 though.

type(svm_label_1)
Out[806]: pandas.core.series.Series

type(svm_label_2)
Out[807]: pandas.core.series.Series

min(svm_label_1)
Out[808]: 0

min(svm_label_2)
Out[809]: 0

max(svm_label_1)
Out[810]: 1

max(svm_label_2)
Out[811]: 1

sum(svm_label_1)
Out[812]: 81

sum(svm_label_2)
Out[813]: 89

len(svm_label_1)
Out[814]: 856

len(svm_label_2)
Out[815]: 856

Does anyone know what's going wrong here?


Solution

  • I don't know why it works, but converting svm_label_2 into a list worked:

    svm_label_2 = list(svm.true_label[svm.fold_number == 2])
    

    Since, svm_label_1 and svm_label_2 are of the same type, I don't understand why the latter raised an error and the first one did not. Therefore, I still welcome any explanation to this phenomenon.