I'm currently trying to implement a path tracer inside a fragment shader which leverages a very simple BVH
The code for the BVH intersection is based on the following idea:
bool BVHintersects( Ray ray ) {
Object closestObject;
vec2 toVisit[100]; // using a stack to keep track which node should be tested against the current ray
int stackPointer = 1;
toVisit[0] = vec2(0.0, 0.0); // coordinates of the root node in the BVH hierarcy
while(stackPointer > 0) {
stackPointer--; // pop the BVH node to examine
if(!leaf) {
// examine the BVH node and eventually update the stackPointer and toVisit
}
if(leaf) {
// examine the leaf and eventually update the closestObject entry
}
}
}
The problem with the above code, is that on the second light bounce something very strange starts to happen, assuming I'm calculating light bounces this way:
vec3 color = vec3(0.0);
vec3 normal = vec3(0.0);
// first light bounce
bool intersects = BVHintersect(ro, rd, color, normal);
vec3 lightPos = vec3(5, 15, 0);
// updating ray origin & direction
ro = ro + rd * (t - 0.01);
rd = normalize(lightPos - ro);
// second light bounce used only to calculate shadows
bool shadowIntersects = BVHintersect(ro, rd, color, normal);
The second call to BVHintersect would run indefinitely, because the while loop never exits, but from many tests I've done on that second call I'm sure that the stackPointer eventually goes back to 0 successfully, in fact if I place the following code just under the while loop:
int iterationsMade = 0;
while(stackPointer > 0) {
iterationsMade++;
if(iterationsMade > 100) {
break;
}
// the rest of the loop
// after the functions ends it also returns "iterationsMade"
the variable "iterationsMade" is always under 100, the while loop doesn't run infinitely, but performance wise it's as if I did "100" iterations, even if "iterationsMade" is never bigger than say 10 or 20. Increasing the hardcoded "100" to a bigger value would linearly degrade performance
What could be the possible causes for this behaviour? What's a possible reason for that second call to BVHIntersect to get stuck inside that while loop if it never does more than 10-20 iterations?
Source for the the BVHintersect function: https://pastebin.com/60SYRQAZ
So, there's a funny thing about loops in shaders (or most SIMD circumstances):
The entire wave will take at least as long to execute as the slowest thread. So, if one thread needs to take ~100 iterations, then they ALL take 100 iterations. Depending on your platform and compiler, the loop may be unrolled to 100 iterations (or whatever upper bound you choose). Anything after the break
won't affect the final output, but the rest of the unrolled loop will still have to be processed. Early-out isn't always possible.
There are a number of ways around this, but perhaps the most straightforward is to do this in multiple passes with a lower max iterations value.
I would also run your shader through a compiler and look at the generated code. Compare different versions with different max iterations and look at things like the length of compiled shader.