Search code examples
halide

Why does the last func in a series start with halide_copy_to_host, and how to remove it


I have a program that creates a gradient image. If I compile this for my GPU and look at the output of the compile_to_lowered_stmnt I see it starts with (after the produce statement) a halide_copy_to_host and then starts the outer loop. If I nest functions, the halide_copy_to_host is at the same position, but then for the outermost func. Note I do nothing with scheduling. I'd like to understand why it is at that location. I expect it to be at the end of the program, to copy the result back to the host, not just before the calculations have finished. And if I want the result to stay on the GPU (e.g. output to screen) the algorithm should run faster without the copy. Is there a way to "remove" the halide_copy_to_host?


Solution

  • The copy_to_host is probably there in case the output buffer is dirty on the GPU. If it is, and we didn't copy to host, then you'd have a buffer dirty on CPU and GPU, which is impossible to reconcile without tracking which parts of the buffer are dirty on each GPU vs CPU.

    However, copy_to_host is a no-op if the dev_dirty flag is set to false, so it shouldn't actually be doing anything in your case. I think it's likely that no copies are happening. If you enable the -debug target flag you can check for yourself.

    Halide doesn't do a copy_to_host after GPU compute in case you want the result to stay on the GPU, for the reasons you say.