I have a function:
def create_variables(name, probabilities, labels):
print('function called')
model = Metrics(probabilities, labels)
prec_curve = model.precision_curve()
kappa_curve = model.kappa_curve()
tpr_curve = model.tpr_curve()
fpr_curve = model.fpr_curve()
pr_auc = auc(tpr_curve, prec_curve)
roc_auc = auc(fpr_curve, tpr_curve)
auk = auc(fpr_curve, kappa_curve)
return [name, prec_curve, kappa_curve, tpr_curve, fpr_curve, pr_auc, roc_auc, auk]
I have the following variables:
svm = pd.read_csv('SVM.csv')
svm_prob_1 = svm.probability[svm.fold_number == 1]
svm_prob_2 = svm.probability[svm.fold_number == 2]
svm_label_1 = svm.true_label[svm.fold_number == 1]
svm_label_2 = svm.true_label[svm.fold_number == 2]
I want to execute the following lines:
svm1 = create_variables('svm_fold1', svm_prob_1, svm_label_1)
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
Python works as expected for svm1. However, when it starts processing svm2, I receive the following error:
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
function called
Traceback (most recent call last):
File "<ipython-input-742-702cfac4d100>", line 1, in <module>
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_2)
File "<ipython-input-741-b8b5a84f0298>", line 6, in create_variables
prec_curve = model.precision_curve()
File "<ipython-input-734-dd9c309be961>", line 59, in precision_curve
self.tp, self.tn, self.fp, self.fn = self.confusion_matrix(self.preds)
File "<ipython-input-734-dd9c309be961>", line 72, in confusion_matrix
if pred == self.labels[i]:
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py", line 1068, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\20200016\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4730, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 992, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
svm_prob_1
and svm_prob_2
are both of the same shape and contain non-zero values. svm_label_2
contains 0's and 1's and has the same length as svm_prob_2.
Furthermore, the error seems to be in svm_label_1
. After changing this variable, the following line does work:
svm2 = create_variables('svm_fold2', svm_prob_2, svm_label_1
Based on the code below, there seems to be no difference between svm_label_1
and svm_label_2
though.
type(svm_label_1)
Out[806]: pandas.core.series.Series
type(svm_label_2)
Out[807]: pandas.core.series.Series
min(svm_label_1)
Out[808]: 0
min(svm_label_2)
Out[809]: 0
max(svm_label_1)
Out[810]: 1
max(svm_label_2)
Out[811]: 1
sum(svm_label_1)
Out[812]: 81
sum(svm_label_2)
Out[813]: 89
len(svm_label_1)
Out[814]: 856
len(svm_label_2)
Out[815]: 856
Does anyone know what's going wrong here?
I don't know why it works, but converting svm_label_2
into a list worked:
svm_label_2 = list(svm.true_label[svm.fold_number == 2])
Since, svm_label_1
and svm_label_2
are of the same type, I don't understand why the latter raised an error and the first one did not. Therefore, I still welcome any explanation to this phenomenon.