I have many NumPy arrays of dtype np.int16
that I need to convert to torch.Tensor
within a torch.utils.data.Dataset
. This np.int16
ideally gets converted to a torch.ShortTensor
of size torch.int16
(docs).
torch.from_numpy(array)
will convert the data to torch.float64
, which takes up 4X more memory than torch.int16
(64 bits vs 16 bits). I have a LOT of data, so I care about this.
How can I convert a numpy array to a torch.Tensor
minimizing memory?
Converting a numpy array to torch tensor:
array = np.ones((1000, 1000), dtype=np.int16)
print("NP Array size: {}".format(array.nbytes))
t = torch.as_tensor(array) # as_tensor avoids copying of array
print("Torch tensor type: {}".format(t.dtype))
print("Torch tensor size: {}".format(t.storage().nbytes()))
NP Array size: 2000000
Torch tensor type: torch.int16
Torch tensor size: 2000000