I am trying to construct a confusion matrix without using the sklearn library. I am having trouble correctly forming the confusion matrix. Here's my code:
def comp_confmat():
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
cm = []
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
for c1 in range(1,classes+1):#for every true class
counts = []
for c2 in range(1,classes+1):#for every predicted class
count = 0
for p in range(len(currentDataClass)):
if currentDataClass[p] == predictedClass[p]:
count += 1
counts.append(count)
cm.append(counts)
print(np.reshape(cm,(classes,classes)))
However this returns:
[[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]]
But I don't understand why each iteration results in 7 when I am reseting the count each time and it's looping through different values?
This is what I should be getting (using the sklearn's confusion_matrix function):
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]
In your innermost loop, there should be a case distinction: Currently this loop counts agreement, but you only want that if actually c1 == c2
.
Here's another way, using nested list comprehensions:
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
counts = [[sum([(currentDataClass[i] == true_class) and (predictedClass[i] == pred_class)
for i in range(len(currentDataClass))])
for pred_class in range(1, classes + 1)]
for true_class in range(1, classes + 1)]
counts
[[3, 0, 0, 0, 1],
[2, 1, 0, 1, 0],
[0, 1, 3, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0]]