Search code examples
graphicssynchronizationgpuvulkancompute-shader

How to synchronize a draw call with a dispatch call as late as possible?


I have a compute shader which updates a storage image which is later sampled by a fragment shader within a render pass.

From khronos vulkan synchronization examples I know I can insert a pipeline barrier before the render pass to make sure the fragment shader samples the image without hazards. Note the example is modified slightly to include more draw calls.

vkCmdDispatch(...); // update the image

VkImageMemoryBarrier2KHR imageMemoryBarrier = {
  ...
  .srcStageMask = VK_PIPELINE_STAGE_2_COMPUTE_SHADER_BIT_KHR,
  .srcAccessMask = VK_ACCESS_2_SHADER_WRITE_BIT_KHR,
  .dstStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT_KHR,
  .dstAccessMask = VK_ACCESS_2_SHADER_READ_BIT_KHR,
  .oldLayout = VK_IMAGE_LAYOUT_GENERAL,
  .newLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL
  /* .image and .subresourceRange should identify image subresource accessed */};

VkDependencyInfoKHR dependencyInfo = {
    ...
    1,                      // imageMemoryBarrierCount
    &imageMemoryBarrier,    // pImageMemoryBarriers
    ...
}

vkCmdPipelineBarrier2KHR(commandBuffer, &dependencyInfo);

... // Render pass setup etc.

vkCmdDraw(...); // does not sample the image
vkCmdDraw(...); // does not sample the image
vkCmdDraw(...); // does not sample the image

...

vkCmdDraw(...); // sample the image written by the compute shader, synchronized.

In the example, I have a bunch of draw calls within the same render pass that do not need the synchronization with the compute shader. They merely render a static geometry / textures which do not update dynamically. Yet in this configuration they must wait for the compute shader.

Ideally, I would like the independent draw calls between the vkCmdDispatch call and last vkCmdDraw to be able to run concurrently.

If I understand the spec correctly, I can't put the same pipeline barrier within the render pass. Another alternative I considered is to use external subpass dependencies and record the draw call which samples the texture in a second subpass. But I don't know if this is a valid approach, and in any case it will be hard to maintain as this configuration is hard coded into the renderpass object.

So Is there a different synchronization approach that can achieve better concurrency?


Solution

  • You should put that in a subpass external dependency for the subpass you need to use it within. However, unless the rendering commands you want to overlap with the compute shader are in a prior subpass in the dependency graph, this probably won't give you any greater performance.

    Note that not even dynamic rendering helps you here, as vkCmdBeginRendering starts a render pass instance. This means that you still can't have pipeline barriers or events within them.

    Essentially, collective rendering operations (either the render pass as a whole or subpasses within it) defines an inflexible synchronization boundary between themselves as a group and the outside world. You can put synchronization around them, but not within them.

    That being said, since rendering and compute operations are both eating up the same resources (shader stages), you probably weren't going to get too much overlap anyway.