How can I create a PyCUDA GPUArray from a gpu memory address?

I'm working with PyTorch and want to do some arithmetic on Tensor data with the help of PyCUDA. I can get a memory address of a cuda tensor t via t.data_ptr(). Can I somehow use this address and my knowledge of the size and data type to initialize a GPUArray? I am hoping to avoid copying the data, but that would also be an alternative.

Solution

It turns out this is possible. We need a pointer do the data, which needs some additional capabilities:

class Holder(PointerHolderBase):

    def __init__(self, tensor):
        super().__init__()
        self.tensor = tensor
        self.gpudata = tensor.data_ptr()

    def get_pointer(self):
        return self.tensor.data_ptr()

    def __int__(self):
        return self.__index__()

    # without an __index__ method, arithmetic calls to the GPUArray backed by this pointer fail
    # not sure why, this needs to return some integer, apparently
    def __index__(self):
        return self.gpudata

We can then use this class to instantiate GPUArrays. The code uses Reikna arrays which are a subclass but should work with pycuda arrays as well.

def tensor_to_gpuarray(tensor, context=pycuda.autoinit.context):
    '''Convert a :class:`torch.Tensor` to a :class:`pycuda.gpuarray.GPUArray`. The underlying
    storage will be shared, so that modifications to the array will reflect in the tensor object.
    Parameters
    ----------
    tensor  :   torch.Tensor
    Returns
    -------
    pycuda.gpuarray.GPUArray
    Raises
    ------
    ValueError
        If the ``tensor`` does not live on the gpu
    '''
    if not tensor.is_cuda:
        raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
    else:
        thread = cuda.cuda_api().Thread(context)
    return reikna.cluda.cuda.Array(thread, tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype), base_data=Holder(tensor))

We can go back with this code. I have not found a way to do this without copying the data.

def gpuarray_to_tensor(gpuarray, context=pycuda.autoinit.context):
    '''Convert a :class:`pycuda.gpuarray.GPUArray` to a :class:`torch.Tensor`. The underlying
    storage will NOT be shared, since a new copy must be allocated.
    Parameters
    ----------
    gpuarray  :   pycuda.gpuarray.GPUArray
    Returns
    -------
    torch.Tensor
    '''
    shape = gpuarray.shape
    dtype = gpuarray.dtype
    out_dtype = numpy_dtype_to_torch(dtype)
    out = torch.zeros(shape, dtype=out_dtype).cuda()
    gpuarray_copy = tensor_to_gpuarray(out, context=context)
    byte_size = gpuarray.itemsize * gpuarray.size
    pycuda.driver.memcpy_dtod(gpuarray_copy.gpudata, gpuarray.gpudata, byte_size)
    return out

Old answer

from pycuda.gpuarray import GPUArray


def torch_dtype_to_numpy(dtype):
    dtype_name = str(dtype)[6:]     # remove 'torch.'
    return getattr(np, dtype_name)


def tensor_to_gpuarray(tensor):
    if not tensor.is_cuda:
        raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
    else:
        array = GPUArray(tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype),
                         gpudata=tensor.data_ptr())
        return array.copy()

Unfortunately, passing an int as the gpudata keyword (or a subtype of pycuda.driver.PointerHolderBase as was suggested in the pytorch forum) seems to work on the surface, but many operations fail with seemingly unrelated errors. Copying the array seems to transform it into a useable format though. I think it is related to the fact that the gpudata member should be a pycuda.driver.DeviceAllocation object which cannot be instantiated from Python it seems.

Now how to go back from the raw data to a Tensor is another matter.