I want to quickly copy a small piece (several bytes) of a huge GPU buffer to the CPU. ILGPU CopyToCpu takes a long time because it copies the entire buffer. Is there a way in ILGPU to copy a small slice of an array, for speed sake?
Found the answer:
// (This is CPU code)
// define the dimensions of a cubical 1GB 3D buffer
var fieldSize = new Index3D(1000, 1000, 1000);
// allocate that buffer in the GPU
using var field = accelerator.Allocate3DDenseXY<VoxelType>(fieldSize);
// run the kernel
kernel(fieldSize, ...);
// wait for it to complete
accelerator.DefaultStream.Synchronize();
// get a SubView of a single voxel at 400, 783, 288
// (we can get larger slices, but see text)
// You also might need to use field.View.AsGeneral().SubView(...)
// depending on the stride of the buffer
var vicinity = field.View.SubView(new Index3D(400, 783, 288), new Index3D(1, 1, 1));
// copy it to CPU
vicinity.CopyToCPU(myCpu3dArray);
Works as expected for 1D buffers. I've verified this is as fast as it should be and is NOT getting the entire buffer. Note, however, that for 2D and 3D buffers the SubView method is not smart enough to grab only a rectangle of the size indicated by the extent (2nd) argument. Rather, SubView grabs every element between the starting point to the ending point indicated by extent, as if the array is flattened to 1D. Thus, if we want a 2D or 3D rectangle, we have to iterate multiple flattened slices (i.e., contiguous sections through row-major dimension, only) to get all the slices of the rectangle.