Search code examples
pythonpytorch

Dimension of target and label in Pytorch


I know this is probably discussed somewhere but I couldn't find it. I always have a missmatch of shapes when using pytorch between target and label. For a batch size of 64 I would get [64, 1] for the target and [64] for the label. I always fix this using label.view(-1, 1) inside the loss function.

I was wondering if there is a "best way" to fix this. Because I could also just use target.view(-1) to get the same result. Or I could even change the output in the network to output.view(-1). Maybe it's also better to use something like .reshape()?

The missmatch probably comes from inside the dataloader, which gets y_train, y_test as a Series not as a dataframe (since y = X.pop(target_name)). So doing y_train.values will give a 1D array. Should I fix it here?

I am happy for any kind of feedback :) If needed, I could also provide a small example of the process, but I think the question should also work without since it is a general problem.


Solution

  • There may be some loss functions that aggregate along a certain dimension, in which case the number of dimensions may matter, but in most cases this is probably not true (i.e. this probably mostly true in classification tasks where the pytorch loss functions are somewhat particular about the dimension and format of the inputs). Barring this case, there should be no difference in the result of the operation.

    That being said, at a lower level pytorch is implemented with C backend with optimization for vector operations along certain dimensions (see for example channels first vs channels last computation). I'd suspect that any such effects would be quite negligible from a computation time perspective on your problem, unless your loss function itself is computation-heavy with large input dimensions. You could test view and reshape operations to different final shapes if you wanted to explore which is the fastest.

    Using view or reshape is fine, the notable difference being that view throws an exception when the input is non-contiguous whereas reshape handles this case gracefully. In general, you can think about these questions:

    1. During restructuring to match dimensions, do I perform any operations that are non-differentiable / have no gradient implemented in pytorch? (Obviously these will break the computation graph).
    2. Do I perform any operations with a non-trivial gradient? view,and reshape have a trivial gradient of 1 i.e. they do not alter the gradient.

    If you don't do these things, your final reshaping step is fine though perhaps ever-so-slightly less than optimal in terms of computation speed.