Search code examples
pythonnumpyviewpytorchreshape

Pytorch different outputs between with transpose


Let I have a tensor dimension of (B, N^2, C) and I reshape it into (B, C, N, N).

I think that I have two choices below

A = torch.rand(5, 100, 20) # Original Tensor

# First Method
B = torch.transpose(2, 1)
B = B.view(5, 20, 10, 10)

# Second Method
C = A.view(5, 20, 10, 10)

Both methods work but the outputs are slightly different and I cannot catch the difference between them.

Thanks


Solution

  • The difference between B and C is that you have used torch.transpose which means you have swapped two axes, this means you have changed the layout of the memory. The view at the end is just a nice interface for you to access your data but it has no effect on the underlying data of your tensor. What it comes down to is a contiguous memory data buffer.

    If you take a smaller example, something we can grasp more easily:

    >>> A = torch.rand(1, 4, 3)
    tensor([[[0.2656, 0.5920, 0.3774],
             [0.8447, 0.5984, 0.0614],
             [0.5160, 0.8048, 0.6260],
             [0.1644, 0.3144, 0.1040]]])
    

    Here swapping axis=1 and axis=2 comes down to a batched transpose (in mathematical terms):

    >>> B = A.transpose(2, 1)
    tensor([[[0.4543, 0.7447, 0.7814, 0.3444],
             [0.9766, 0.2732, 0.4766, 0.0387],
             [0.0123, 0.7260, 0.8939, 0.8581]]])
    

    In terms of memory layout A has the following memory arangement:

    >>> A.flatten()
    tensor([0.4543, 0.9766, 0.0123, 0.7447, 0.2732, 0.7260, 0.7814, 0.4766, 0.8939,
            0.3444, 0.0387, 0.8581])
    

    While B has a different layout. By layout I mean memory arrangement, I am not referring to its shape which is irrelevant:

    >>> B.flatten()
    tensor([0.4543, 0.7447, 0.7814, 0.3444, 0.9766, 0.2732, 0.4766, 0.0387, 0.0123,
            0.7260, 0.8939, 0.8581])
    

    As I said reshaping i.e. building a view on top of a tensor doesn't change its memory layout, it's an abstraction level to better manipulate tensors.

    So in the end, yes you end up with two different results: C shares the same data as A, while B is a copy and has a different memory layout.