How to tell PyCUDA to reuse the memory from an earlier kernel?

My program has two kernels and the second kernel should use the already uploaded input data and the results from the first kernel, so I can save the memory transfers. How would I archive this?

This is how I launch my kernels:

result = gpuarray.zeros(points, dtype=np.float32)  

kernel(
    driver.In(dataT),result,np.int32(points),
    grid = (blocks,1),
    block = (block_size, 1, 1),
)

Solution

In pycuda you won't transfer data to and from the device unless you explicitly request it. For example, if you allocate memory and transfer some data to the GPU with:

result = float64(zeros( (height,width) )
result_device = gpuarray.to_gpu(result)

The variable result_device is a reference to the data in the GPU. You can pass result_device to any other kernel without incurring a memory transfer back to the CPU. In this case a memory transfer will happen again when you call:

result = result_device.get()