Pytorch different outputs between with transpose

Let I have a tensor dimension of (B, N^2, C) and I reshape it into (B, C, N, N).

I think that I have two choices below

A = torch.rand(5, 100, 20) # Original Tensor

# First Method
B = torch.transpose(2, 1)
B = B.view(5, 20, 10, 10)

# Second Method
C = A.view(5, 20, 10, 10)

Both methods work but the outputs are slightly different and I cannot catch the difference between them.

Thanks

Solution

The difference between B and C is that you have used torch.transpose which means you have swapped two axes, this means you have changed the layout of the memory. The view at the end is just a nice interface for you to access your data but it has no effect on the underlying data of your tensor. What it comes down to is a contiguous memory data buffer.

If you take a smaller example, something we can grasp more easily:

>>> A = torch.rand(1, 4, 3)
tensor([[[0.2656, 0.5920, 0.3774],
         [0.8447, 0.5984, 0.0614],
         [0.5160, 0.8048, 0.6260],
         [0.1644, 0.3144, 0.1040]]])

Here swapping axis=1 and axis=2 comes down to a batched transpose (in mathematical terms):

>>> B = A.transpose(2, 1)
tensor([[[0.4543, 0.7447, 0.7814, 0.3444],
         [0.9766, 0.2732, 0.4766, 0.0387],
         [0.0123, 0.7260, 0.8939, 0.8581]]])

In terms of memory layout A has the following memory arangement:

>>> A.flatten()
tensor([0.4543, 0.9766, 0.0123, 0.7447, 0.2732, 0.7260, 0.7814, 0.4766, 0.8939,
        0.3444, 0.0387, 0.8581])

While B has a different layout. By layout I mean memory arrangement, I am not referring to its shape which is irrelevant:

>>> B.flatten()
tensor([0.4543, 0.7447, 0.7814, 0.3444, 0.9766, 0.2732, 0.4766, 0.0387, 0.0123,
        0.7260, 0.8939, 0.8581])

As I said reshaping i.e. building a view on top of a tensor doesn't change its memory layout, it's an abstraction level to better manipulate tensors.

So in the end, yes you end up with two different results: C shares the same data as A, while B is a copy and has a different memory layout.