Search code examples
pythonarrayspytorch

I'm learning Pytorch and don't understand from_numpy() behaviors


I'm currently learning Pytorch, and encountered some unexpected behaviour when using the torch.from_numpy() command.

import torch as t
import numpy as np

array = np.arange(1, 10)
tensor = t.from_numpy(array)
print(array, tensor)
array[:] += 1 
print(array, tensor)

outputs to this:

[1 2 3 4 5 6 7 8 9] tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
[ 2  3  4  5  6  7  8  9 10] tensor([ 2,  3,  4,  5,  6,  7,  8,  9, 10])

When I run the above code, the pytorch tensor change when the numpy array is changed and vice versa. This is expected behaviour according to pytorch documentation for from_numpy.

However, when I change the code a bit to:

import torch as t
import numpy as np

array = np.arange(1, 10)
tensor = t.from_numpy(array)
print(array, tensor)
array = array +1
print(array, tensor)

the output becomes:

[1 2 3 4 5 6 7 8 9] tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
[ 2  3  4  5  6  7  8  9 10] tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])

Weirdly, if i change the array line to array += 1, the tensor and numpy array behaves as expected. Can anyone explain why? I'm using google collab to run this, and using cpu.


Solution

  • The difference is array[:] += 1 is an in-place operation, while array = array +1 is not.

    To start, lets create the arrays and look at their data ids

    array = np.arange(1, 10)
    tensor = torch.from_numpy(array)
    print(id(array), id(tensor))
    > (140232345845456, 140232355106384)
    

    In the above, array and tensor are objects with a specific ID.

    array and tensor have different IDs, as they are different objects. Under the hood, tensor points to the same piece of memory as array. This is the point of using torch.from_numpy - it creates a tensor referencing the same memory which avoids copying the data from the numpy array.

    Now we update with array[:] += 1. This is an in-place operation, meaning we mutate the underlying data of array. When we print the IDs of array and tensor, note that they are the same as above. We are looking at the same objects. Then we print array and tensor themselves. We added 1 to array, the values of array are updated. We see the values of tensor are also updated. This is because tensor references the same memory as array and we updated array with an in-place operation, changing that piece of memory.

    array[:] += 1
    print(id(array), id(tensor))
    > (140232345845456, 140232355106384)
    
    print(array, tensor)
    > [ 2  3  4  5  6  7  8  9 10] tensor([ 2,  3,  4,  5,  6,  7,  8,  9, 10])
    

    Now we update with array = array +1. This is not an in-place operation, so we are creating a new array reference. When we look at the ID values, we see that array has a different ID, while tensor has the same ID.

    The variable array now references a new object, while tensor references the old object. This is why array = array + 1 updates array but not tensor.

    array = array + 1
    print(id(array), id(tensor))
    > (139781912744080, 140232355106384)
    
    print(array, tensor)
    [ 3  4  5  6  7  8  9 10 11] tensor([ 2,  3,  4,  5,  6,  7,  8,  9, 10])