hlsl unreal-engine4 fragment-shader flow-control

Flow Control in HLSL

I recently read this paper about raymarching clouds (careful it´s an PDF, in case you dont want that: http://www.diva-portal.org/smash/get/diva2:1223894/FULLTEXT01.pdf) where the author goes on about optimizing (page 22ff.) the algorithm via reprojection. He states that by raymarching only 1/16th of all pixels per frame (the selected pixel hopping around in a 4x4 grid) and reprojecting the rest, he got about 10 times a performance increase.

I now tried implementing this as well in Unreal Engine 4 (custom HLSL shader) and I got the raymarching as well as the reprojection working now. However I´m stuck at actually only running the raymarching on the necessary pixels. As far as I´m aware with any branching in HLSL both sides of the branch will be calculated and one will be thrown away. I therefore can´t do something like this pseudo code in the pixel shader: if(!PixelReprojection) { return 0;} else { return Raymarch(...); } as it will calculate the Raymarch even for pixels that are getting reprojected.

I don´t see any other way though to archieving this... Is there any kind of branching in HLSL that allows this? It can´t be static as the pixels subjected to raymarching and reprojecting change every frame. I´m really curious on any ideas how the author could have achieved this tenfold increase in performance as he is writing the code on a GPU too, as far as I´m aware.

I´d greatly appreciate any kind of input here.

Regards, foodius

Solution

TLDR: use the attribute [branch] in front of your if-statement.

As far as I´m aware with any branching in HLSL both sides of the branch will be calculated and one will be thrown away

This is actually not fully correct. Yes, a branch can be flattened, which means that both sides are calculated as you described, but it can also be not flattened (called dynamic branching).

Now, not flattening a branch has some disadvantages: If two threads in the same wave take different paths in the branch, a second wave has to be spawned, because all threads in a wave have to run the same code (so some threads would be moved to the newly spawned wave). Therefore, in such a case, a lot of threads are "disabled" (meaning they run the same code as the other threads in their wave, but not actually writing anything into memory). Nonetheless, this dynamic kind of branching may still be faster than running both sides of the branch, but this depends on the actual code.

One can even remove this disadvantes by smart shader design (namely, ensure that the threads that take one side of the branch are in the same wave, so no divergence happens inside a wave. This, however, requires some knowledge of the underlying hardware, like wave size and so on)

In any case: If not stated otherwise, the HLSL compiler decides on its own, whether a branch uses dynamic branching or is flattened. One can, however, enforce one of the two ways by adding an attribute to the if-statement, eg:

//Enforce dynamic branching:
[branch] 
if (...) { ... }
else { ... }

//Enforce flattening of the branch:
[flatten]
if (...) { ... }
else { ... }