Search code examples
pythonmemory-addresspytorchpycuda

How can I create a PyCUDA GPUArray from a gpu memory address?


I'm working with PyTorch and want to do some arithmetic on Tensor data with the help of PyCUDA. I can get a memory address of a cuda tensor t via t.data_ptr(). Can I somehow use this address and my knowledge of the size and data type to initialize a GPUArray? I am hoping to avoid copying the data, but that would also be an alternative.


Solution

  • It turns out this is possible. We need a pointer do the data, which needs some additional capabilities:

    class Holder(PointerHolderBase):
    
        def __init__(self, tensor):
            super().__init__()
            self.tensor = tensor
            self.gpudata = tensor.data_ptr()
    
        def get_pointer(self):
            return self.tensor.data_ptr()
    
        def __int__(self):
            return self.__index__()
    
        # without an __index__ method, arithmetic calls to the GPUArray backed by this pointer fail
        # not sure why, this needs to return some integer, apparently
        def __index__(self):
            return self.gpudata
    

    We can then use this class to instantiate GPUArrays. The code uses Reikna arrays which are a subclass but should work with pycuda arrays as well.

    def tensor_to_gpuarray(tensor, context=pycuda.autoinit.context):
        '''Convert a :class:`torch.Tensor` to a :class:`pycuda.gpuarray.GPUArray`. The underlying
        storage will be shared, so that modifications to the array will reflect in the tensor object.
        Parameters
        ----------
        tensor  :   torch.Tensor
        Returns
        -------
        pycuda.gpuarray.GPUArray
        Raises
        ------
        ValueError
            If the ``tensor`` does not live on the gpu
        '''
        if not tensor.is_cuda:
            raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
        else:
            thread = cuda.cuda_api().Thread(context)
        return reikna.cluda.cuda.Array(thread, tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype), base_data=Holder(tensor))
    

    We can go back with this code. I have not found a way to do this without copying the data.

    def gpuarray_to_tensor(gpuarray, context=pycuda.autoinit.context):
        '''Convert a :class:`pycuda.gpuarray.GPUArray` to a :class:`torch.Tensor`. The underlying
        storage will NOT be shared, since a new copy must be allocated.
        Parameters
        ----------
        gpuarray  :   pycuda.gpuarray.GPUArray
        Returns
        -------
        torch.Tensor
        '''
        shape = gpuarray.shape
        dtype = gpuarray.dtype
        out_dtype = numpy_dtype_to_torch(dtype)
        out = torch.zeros(shape, dtype=out_dtype).cuda()
        gpuarray_copy = tensor_to_gpuarray(out, context=context)
        byte_size = gpuarray.itemsize * gpuarray.size
        pycuda.driver.memcpy_dtod(gpuarray_copy.gpudata, gpuarray.gpudata, byte_size)
        return out
    

    Old answer

    from pycuda.gpuarray import GPUArray
    
    
    def torch_dtype_to_numpy(dtype):
        dtype_name = str(dtype)[6:]     # remove 'torch.'
        return getattr(np, dtype_name)
    
    
    def tensor_to_gpuarray(tensor):
        if not tensor.is_cuda:
            raise ValueError('Cannot convert CPU tensor to GPUArray (call `cuda()` on it)')
        else:
            array = GPUArray(tensor.shape, dtype=torch_dtype_to_numpy(tensor.dtype),
                             gpudata=tensor.data_ptr())
            return array.copy()
    

    Unfortunately, passing an int as the gpudata keyword (or a subtype of pycuda.driver.PointerHolderBase as was suggested in the pytorch forum) seems to work on the surface, but many operations fail with seemingly unrelated errors. Copying the array seems to transform it into a useable format though. I think it is related to the fact that the gpudata member should be a pycuda.driver.DeviceAllocation object which cannot be instantiated from Python it seems.

    Now how to go back from the raw data to a Tensor is another matter.