Search code examples
macosmetalmetalkit

When is an MTLFence or MTLEvent required for synchronization between command encoders?


Note: In many ways, this is a follow-up to How do you synchronize a Metal Performance Shader with an MTLBlitCommandEncoder?

I'm still a bit confused when explicit synchronization is required between sequential command encoders and when synchronization is not required because of Metal's architecture.

In the question linked above, Apple's documentation is quoted as saying:

Memory Barriers

Between Command Encoders

All resource writes performed in a given command encoder are visible in the next command encoder. This is true for both render and compute command encoders.

I interpret this as implying that an MTLRenderCommandEncoder does not need explicit synchronization with a previous MTLBlitCommandEncoder if they are both in the same command buffer and occur one after the other.

However, Apple's own sample code appears to contradict that. In Image Filter Graph with Heaps and Fences, it is shown that an MTLFence is required to synchronize access to a texture that is being used first in an MTLBitCommandEncoder followed by two sequential MTLComputeCommandEncoder calls. (One for the horizontal blur and a second for the vertical blur.)

See:
AAPLFilter.m (L:199)
AAPLRenderer.m (L:413)

These command encoders are being executed within the same command buffer. Why does the first MTLComputeCommandEncoder need to explicitly wait for the blit to finish and why does the second compute encoder need to wait for the first compute encoder if, as quoted above, "All resource writes performed in a given command encoder are visible in the next command encoder."?

Pseudo Sample Code:

- (void)drawInMTKView:(nonnull MTKView *)view {

  id <MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];

  id<MTLTexture> masterTexture = self.masterTexture;
  id<MTLTexture> incomingTexture = [self dequeueRenderedTextureIfPresent];

  id<MTLBlitCommandEncoder> blitEncoder = commandBuffer.blitCommandEncoder;
  [blitEncoder copyFromTexture:incomingTexture ... toTexture:masterTexture];
  [blitEncoder endEncoding];

  id <MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor];

  // Is synchronization with the blit encoder required here? 
  // 
  // The fragment shader is going to sample from masterTexture and will
  // expect that the blit command above will have been completed.

  [renderEncoder setFragmentTexture:masterTexture atIndex:0];

  [renderEncoder drawPrimitives:...];
  [commandBuffer commit];
}

In the pseudo code above, does the render command encoder have to explicitly wait for the blit command encoder to finish? In the answer to my previous question, I'm lead to believe that the answer is "No". But looking at Apple's Sample Code for using fences and events, I'm lead to believe that the answer is "Yes".

If synchronization is not required, then what is different between this pseudo-code and Apple's sample code?

Edit #1:

Thanks to Ken's answer below, I quickly found a relevant thread on Apple's Developer Forum that covers this exact issue.

Apple Developer Forum: MTLFence detailed behaviour?

As Ken correctly points out, the key detail to understand is the difference between a tracked texture and an untracked texture.


Solution

  • This manual synchronization is only needed for resources that Metal doesn't automatically track. Resources allocated from MTLHeaps are not automatically tracked. Resources explicitly created with the MTLResourceHazardTrackingModeUntracked option are also untracked.

    From the overview of the Image Filter Graph with Heaps and Fences sample you linked, under Optimize Resource Allocation and Performance:

    When resources are allocated from a device, Metal creates and tracks additional state to ensure that the resource memory is allocated, synchronized, and made available throughout the lifetime of any command buffer that needs the given resource. It does so even if the resource itself is destroyed before the command buffer begins execution.

    Although Metal also carries out this process for heaps, it doesn’t do so for resources within the heap. Instead, the app must perform explicit fine-grained synchronization when it creates objects from the heap and reuses memory.

    The pseudo-code in your question would only need explicit synchronization if the resources are untracked.