I'm working with PyTorch and want to do some arithmetic on Tensor data with the help of PyCUDA. I can get a memory address of a cuda tensor t
via t.data_ptr()
. Can I somehow use this address and my knowledge of the size and data type to initialize a GPUArray
? I am hoping to avoid copying the data, but that would also be an alternative.
It turns out this is possible. We need a pointer do the data, which needs some additional capabilities:
class Holder(PointerHolderBase):
def __init__(self, tensor):
super().__init__()
self.tensor = tensor
self.gpudata = tensor.data_ptr()
def get_pointer(self):
return self.tensor.data_ptr()
def __int__(self):
return self.__index__()
# without an __index__ method, arithmetic calls to the GPUArray backed by this pointer fail
# not sure why, this needs to return some integer, apparently
def __index__(self):
return self.gpudata
We can then use this class to instantiate GPUArray
s. The code uses Reikna arrays which are a subclass but should work with pycuda
arrays as well.
def tensor_to_gpuarray(tensor, context=pycuda.autoinit.context):
'''Convert a :class:`torch.Tensor` to a :class:`pycuda.gpuarray.GPUArray`. The underlying
storage will be shared, so that modifications to the array will reflect in the tensor object.
Parameters
----------
tensor : torch.Tensor
Returns
-------
pycuda.gpuarray.GPUArray
Raises
------
ValueError
If the ``tensor`` does not live on the gpu
'''
if not tensor.is_cuda:
raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
else:
thread = cuda.cuda_api().Thread(context)
return reikna.cluda.cuda.Array(thread, tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype), base_data=Holder(tensor))
We can go back with this code. I have not found a way to do this without copying the data.
def gpuarray_to_tensor(gpuarray, context=pycuda.autoinit.context):
'''Convert a :class:`pycuda.gpuarray.GPUArray` to a :class:`torch.Tensor`. The underlying
storage will NOT be shared, since a new copy must be allocated.
Parameters
----------
gpuarray : pycuda.gpuarray.GPUArray
Returns
-------
torch.Tensor
'''
shape = gpuarray.shape
dtype = gpuarray.dtype
out_dtype = numpy_dtype_to_torch(dtype)
out = torch.zeros(shape, dtype=out_dtype).cuda()
gpuarray_copy = tensor_to_gpuarray(out, context=context)
byte_size = gpuarray.itemsize * gpuarray.size
pycuda.driver.memcpy_dtod(gpuarray_copy.gpudata, gpuarray.gpudata, byte_size)
return out
from pycuda.gpuarray import GPUArray
def torch_dtype_to_numpy(dtype):
dtype_name = str(dtype)[6:] # remove 'torch.'
return getattr(np, dtype_name)
def tensor_to_gpuarray(tensor):
if not tensor.is_cuda:
raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
else:
array = GPUArray(tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype),
gpudata=tensor.data_ptr())
return array.copy()
Unfortunately, passing an int as the gpudata
keyword (or a subtype of pycuda.driver.PointerHolderBase
as was suggested in the pytorch forum) seems to work on the surface, but many operations fail with seemingly unrelated errors. Copying the array seems to transform it into a useable format though.
I think it is related to the fact that the gpudata
member should be a pycuda.driver.DeviceAllocation
object which cannot be instantiated from Python it seems.
Now how to go back from the raw data to a Tensor is another matter.