Search code examples

How to use the confusion matrix module in NLTK?

I followed the NLTK book in using the confusion matrix but the confusionmatrix looks very odd.

#empirically exam where tagger is making mistakes
test_tags = [tag for sent in brown.sents(categories='editorial')
    for (word, tag) in t2.tag(sent)]
gold_tags = [tag for (word, tag) in brown.tagged_words(categories='editorial')]
print nltk.ConfusionMatrix(gold_tags, test_tags)

Can anyone explain how to use the confusion matrix?


  • Firstly, I assume that you got the code from old NLTK's chapter 05:, particularly you're look at this section:

    Now, let's look at the confusion matrix in NLTK, try:

    from nltk.metrics import ConfusionMatrix
    ref  = 'DET NN VB DET JJ NN NN IN DET NN'.split()
    tagged = 'DET VB VB DET NN NN NN IN DET NN'.split()
    cm = ConfusionMatrix(ref, tagged)
    print cm


        | D         |
        | E I J N V |
        | T N J N B |
    DET |<3>. . . . |
     IN | .<1>. . . |
     JJ | . .<.>1 . |
     NN | . . .<3>1 |
     VB | . . . .<1>|
    (row = reference; col = test)

    The numbers embedded in <> are the true positives (tp). And from the example above, you see that one of the JJ from reference was wrongly tagged as NN from the tagged output. For that instance, it counts as one false positive for NN and one false negative for JJ.

    To access the confusion matrix (for calculating precision/recall/fscore), you can access the false negatives, false positives and true positives by:

    labels = set('DET NN VB IN JJ'.split())
    true_positives = Counter()
    false_negatives = Counter()
    false_positives = Counter()
    for i in labels:
        for j in labels:
            if i == j:
                true_positives[i] += cm[i,j]
                false_negatives[i] += cm[i,j]
                false_positives[j] += cm[i,j]
    print "TP:", sum(true_positives.values()), true_positives
    print "FN:", sum(false_negatives.values()), false_negatives
    print "FP:", sum(false_positives.values()), false_positives


    TP: 8 Counter({'DET': 3, 'NN': 3, 'VB': 1, 'IN': 1, 'JJ': 0})
    FN: 2 Counter({'NN': 1, 'JJ': 1, 'VB': 0, 'DET': 0, 'IN': 0})
    FP: 2 Counter({'VB': 1, 'NN': 1, 'DET': 0, 'JJ': 0, 'IN': 0})

    To calculate Fscore per label:

    for i in sorted(labels):
        if true_positives[i] == 0:
            fscore = 0
            precision = true_positives[i] / float(true_positives[i]+false_positives[i])
            recall = true_positives[i] / float(true_positives[i]+false_negatives[i])
            fscore = 2 * (precision * recall) / float(precision + recall)
        print i, fscore


    DET 1.0
    IN 1.0
    JJ 0
    NN 0.75
    VB 0.666666666667

    I hope the above will de-confuse the confusion matrix usage in NLTK, here's the full code for the example above:

    from collections import Counter
    from nltk.metrics import ConfusionMatrix
    ref  = 'DET NN VB DET JJ NN NN IN DET NN'.split()
    tagged = 'DET VB VB DET NN NN NN IN DET NN'.split()
    cm = ConfusionMatrix(ref, tagged)
    print cm
    labels = set('DET NN VB IN JJ'.split())
    true_positives = Counter()
    false_negatives = Counter()
    false_positives = Counter()
    for i in labels:
        for j in labels:
            if i == j:
                true_positives[i] += cm[i,j]
                false_negatives[i] += cm[i,j]
                false_positives[j] += cm[i,j]
    print "TP:", sum(true_positives.values()), true_positives
    print "FN:", sum(false_negatives.values()), false_negatives
    print "FP:", sum(false_positives.values()), false_positives
    for i in sorted(labels):
        if true_positives[i] == 0:
            fscore = 0
            precision = true_positives[i] / float(true_positives[i]+false_positives[i])
            recall = true_positives[i] / float(true_positives[i]+false_negatives[i])
            fscore = 2 * (precision * recall) / float(precision + recall)
        print i, fscore