Search code examples
pythonnumpypytorchnumpy-ndarraypytorch-dataloader

Converting np.int16 to torch.ShortTensor


I have many NumPy arrays of dtype np.int16 that I need to convert to torch.Tensor within a torch.utils.data.Dataset. This np.int16 ideally gets converted to a torch.ShortTensor of size torch.int16 (docs).

torch.from_numpy(array) will convert the data to torch.float64, which takes up 4X more memory than torch.int16 (64 bits vs 16 bits). I have a LOT of data, so I care about this.

How can I convert a numpy array to a torch.Tensor minimizing memory?


Solution

  • Converting a numpy array to torch tensor:

    array = np.ones((1000, 1000), dtype=np.int16)
    print("NP Array size: {}".format(array.nbytes))
    t = torch.as_tensor(array) # as_tensor avoids copying of array 
    print("Torch tensor type: {}".format(t.dtype))
    print("Torch tensor size: {}".format(t.storage().nbytes()))
    
    NP Array size: 2000000
    Torch tensor type: torch.int16
    Torch tensor size: 2000000