python numpy pytorch numpy-ndarray pytorch-dataloader

Converting np.int16 to torch.ShortTensor

I have many NumPy arrays of dtype np.int16 that I need to convert to torch.Tensor within a torch.utils.data.Dataset. This np.int16 ideally gets converted to a torch.ShortTensor of size torch.int16 (docs).

torch.from_numpy(array) will convert the data to torch.float64, which takes up 4X more memory than torch.int16 (64 bits vs 16 bits). I have a LOT of data, so I care about this.

How can I convert a numpy array to a torch.Tensor minimizing memory?

Solution

Converting a numpy array to torch tensor:

array = np.ones((1000, 1000), dtype=np.int16)
print("NP Array size: {}".format(array.nbytes))
t = torch.as_tensor(array) # as_tensor avoids copying of array 
print("Torch tensor type: {}".format(t.dtype))
print("Torch tensor size: {}".format(t.storage().nbytes()))

NP Array size: 2000000
Torch tensor type: torch.int16
Torch tensor size: 2000000