Search code examples
multidimensional-arraydeep-learningpytorchtensortorchvision

Why use dim=0 when using torch.softmax() for getting predictions probabilities?


I was watching a tutorial, when he want to calculate the probabilities of a predictions from logits it use softmax with dim=0 why? isn't dim=0 means take softmax across the rows? so shouldn't we use dim=1? like when we want to get the class id we use torch.argmax(**, dim=1) because every row is representing the probability of different classes for one sample so why not use dim=1?

what's the differences between these two(when we're getting class id using argmax and when getting probabilities using softmax)?

I read some answers on the other questions but I didn't understand it


Solution

  • It's hard to say why it might be in the tutorial, but usually you have (batch_number, **your_data) shape as input in your network, output in case of classification usually has (batch_number, number_of_classes), and you're right that in that case you should use dim=1(or recommended way use even dim=-1 because you can have more complicated output, for example - (batch_number, some_more_data, ..., number_of_classes) ) to get model confidence along dim which sum to 1, but sometimes in architecture of deep network might reshape dimension of the data for some purpose then you can check in what dimension number_of_classes is

    and other part of question the difference between of argmax and softmax is that first one returns the confidences along number of classes, the second one returns the one class index with the highest confidence for each sample, usually you apply softmax and then argmax in order to get final class index

    Hope it helps