Search code examples
c#gpu

ILGPU CopyToCpu slice


I want to quickly copy a small piece (several bytes) of a huge GPU buffer to the CPU. ILGPU CopyToCpu takes a long time because it copies the entire buffer. Is there a way in ILGPU to copy a small slice of an array, for speed sake?


Solution

  • Found the answer:

    // (This is CPU code)
    
    // define the dimensions of a cubical 1GB 3D buffer
    var fieldSize = new Index3D(1000, 1000, 1000);
    
    // allocate that buffer in the GPU
    using var field = accelerator.Allocate3DDenseXY<VoxelType>(fieldSize);
    
    // run the kernel
    kernel(fieldSize, ...);
    
    // wait for it to complete
    accelerator.DefaultStream.Synchronize();
    
    // get a SubView of a single voxel at 400, 783, 288
    // (we can get larger slices, but see text)
    // You also might need to use field.View.AsGeneral().SubView(...)
    // depending on the stride of the buffer
    var vicinity = field.View.SubView(new Index3D(400, 783, 288), new Index3D(1, 1, 1));
    
    // copy it to CPU
    vicinity.CopyToCPU(myCpu3dArray);
    

    Works as expected for 1D buffers. I've verified this is as fast as it should be and is NOT getting the entire buffer. Note, however, that for 2D and 3D buffers the SubView method is not smart enough to grab only a rectangle of the size indicated by the extent (2nd) argument. Rather, SubView grabs every element between the starting point to the ending point indicated by extent, as if the array is flattened to 1D. Thus, if we want a 2D or 3D rectangle, we have to iterate multiple flattened slices (i.e., contiguous sections through row-major dimension, only) to get all the slices of the rectangle.