My program has two kernels and the second kernel should use the already uploaded input data and the results from the first kernel, so I can save the memory transfers. How would I archive this?
This is how I launch my kernels:
result = gpuarray.zeros(points, dtype=np.float32)
kernel(
driver.In(dataT),result,np.int32(points),
grid = (blocks,1),
block = (block_size, 1, 1),
)
In pycuda you won't transfer data to and from the device unless you explicitly request it. For example, if you allocate memory and transfer some data to the GPU with:
result = float64(zeros( (height,width) )
result_device = gpuarray.to_gpu(result)
The variable result_device is a reference to the data in the GPU. You can pass result_device to any other kernel without incurring a memory transfer back to the CPU. In this case a memory transfer will happen again when you call:
result = result_device.get()