Search code examples
pytorchmetricsconfusion-matrixmultilabel-classificationpytorch-lightning

Pytorch metrics: Multi label Confusion matrix different device error


I'm trying to calculate the confusion matrix using torchmetrics for my multi-label output, but I get the following error:

File "/home/antpc/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 394, in wrapped_func
    raise RuntimeError(
RuntimeError: Encountered different devices in metric calculation (see stacktrace for details).This could be due to the metric class not being on the same device as input.Instead of `metric=ConfusionMatrix(...)` try to do `metric=ConfusionMatrix(...).to(device)` where device corresponds to the device of the input.

My code:

from torchmetrics import ConfusionMatrix
def calculate_metrics(predictions, targets):
    cm = ConfusionMatrix(num_classes=34, multilabel=True)
    matrix = cm(predictions, targets)
    return matrix

Then I tried to change my code as:

from torchmetrics import ConfusionMatrix
def calculate_metrics(predictions, targets):
    cm = ConfusionMatrix(num_classes=34, multilabel=True).to(device='cpu')
    matrix = cm(predictions.detach().cpu(), targets.detach().cpu())
    return matrix

Still it shows the same error. Can anyone help me out with this?

Please don't suggest me to use sklearn.metrics.multilabel_confusion_matrix


Solution

  • This error was not caused by the metrics, but rather by Pytorch lightning because of using multiple GPUs.

    My previous Code:

    model = ModelClassifier()
    trainer = pl.Trainer(strategy='dp', max_epochs=150, gpus=8, fast_dev_run=True)
    trainer.fit(model, train_loader)
    

    the error was resolved after changing the strategy and removing fast_dev_run=True

    Working code:

    model = ModelClassifier()
    trainer = pl.Trainer(strategy='ddp', max_epochs=150, gpus=8)
    trainer.fit(model, train_loader)