Search code examples
3drenderingvulkandirectx-12

Looking for tips on debugging strange point-light shadow artifacts


This has had me stuck for a while, so I'm hoping someone can provide some wisdom (or at least some tips on how to figure out what the heck is going on!).

I have a renderer that supports DX11, DX12 and Vulkan, that supports some (pretty basic, just simple cubemap depth) shadows for point lights. At lower frame rates, these work just fine, but at crazy high frame rates and with the lights moving rapidly, I'm getting shadowing artifacts where it appears as if the shadow distance is off slightly compared to the light position. That's just a guess though, since if I pause, or do a frame capture in (for example) RenderDoc, the artifacts disappear. It's not been possible to grab a frame with the artifacts to debug, but I did manage to grab a screenshot. This only occurs when the light is moving away from the planar surface.

enter image description here

This only manifests with DX12 and Vulkan, but identical in both.

I fixed an issue a few weeks back where updates were out of order and the shadow generation was a frame behind the main rendering; this was pretty easy to repro and debug. This new case, not so much! Given the inability to repro when capturing, etc, it's been tough to track down.

I'd generally not be too concerned, since when rendering anything useful there's no issues since the framerate is lower, but I'm worried what I'm seeing is the result of something more systemic that'll bite me later.

Edit: So far, I've tried:

  • RenderDoc (unable to grab a capture manifesting this issue - either by queueing captures or hitting F12 when it manifests)

  • Debug visualizations - they show the issue, but as soon as I pause or perform a capture, the issue disappears.

  • Tracy profiler to show timings of when constants are updated and when tasks are scheduled. There's nowhere near any overlap with updates and consumption of data.

  • Putting small pauses in the main application loop. This fixes the issue (even a .5 millisecond pause on the main thread, which drops the fps from ~800fps to 600).


Solution

  • I finally figured this out, and, not surprisingly, it was a really silly mistake. But despite being rather dumb, it was also a fairly gnarly one to figure out, since there were no Vulkan validation errors/warnings, it was occurring both in DX12 and Vulkan, and was highly dependent on timing.

    I figured I should post the solution, since if anyone else happens to make this same mistake (or something similar), and stumbles on this question, I could save them some pain.

    So, here it is. My Vulkan uniform buffer implementation tries to be smart, and creates a queue of buffers that are tagged with the last frame they were consumed by rendering work. If there's an update to a uniform buffer, I check the front (oldest) element in the queue, and if it's frame is older than my frame buffering count, I'll re-use it (overwriting the uniform buffer data) for the next frame, and toss that buffer to the back of the queue.

    However, I had an off-by-one error in that calculation (for both DX12 and Vulkan) due to the fact that there's some overlap between frame workloads, and that meant that, depending on timing, I'd be overwriting a uniform buffer while the GPU was still using it, so the GPU would be calculating point shadows with the next frame's light data. So, literally fixing that off-by-one, and adding an extra frame of buffering, fixed this annoying (random) issue.

    Side note: always, always run regularly with the Vulkan validation layers enabled. Whilst occasionally cryptic, it's incredibly thorough and (outside of this case) has helped me catch numerous issues that would have been a nightmare to track down otherwise.