I would like to compute the f1-score for a classifier trained with allen-nlp. I used the working code from a allen-nlp guide, which computed accuracy, not F1, so I tried to adjust the metric in the code.
According to the documentation, CategoricalAccuracy and FBetaMultiLabelMeasure take the same inputs. (predictions: torch.Tensor
of shape [batch_size, ..., num_classes]
, gold_labels: torch.Tensor
of shape [batch_size, ...]
)
But for some reason the input that worked perfectly well for the accuracy results in a RuntimeError when given to the f1-multi-label metric.
I condensed the problem to the following code snippet:
>>> from allennlp.training.metrics import CategoricalAccuracy, FBetaMultiLabelMeasure
>>> import torch
>>> labels = torch.LongTensor([0, 0, 2, 1, 0])
>>> logits = torch.FloatTensor([[ 0.0063, -0.0118, 0.1857], [ 0.0013, -0.0217, 0.0356], [-0.0028, -0.0512, 0.0253], [-0.0460, -0.0347, 0.0400], [-0.0418, 0.0254, 0.1001]])
>>> labels.shape
torch.Size([5])
>>> logits.shape
torch.Size([5, 3])
>>> ca = CategoricalAccuracy()
>>> f1 = FBetaMultiLabelMeasure()
>>> ca(logits, labels)
>>> f1(logits, labels)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../lib/python3.8/site-packages/allennlp/training/metrics/fbeta_multi_label_measure.py", line 130, in __call__
true_positives = (gold_labels * threshold_predictions).bool() & mask & pred_mask
RuntimeError: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 1
Why is this error happening? What am I missing here?
You want to use FBetaMeasure
, not FBetaMultiLabelMeasure
. "Multilabel" means you can specify more than one correct answer, but "Categorical Accuracy" only allows one correct answer. That means you have to specify another dimension in your labels.
I suspect the documentation of FBetaMultiLabelMeasure
is misleading. I'll look into fixing it.