Note: In many ways, this is a follow-up to How do you synchronize a Metal Performance Shader with an MTLBlitCommandEncoder?
I'm still a bit confused when explicit synchronization is required between sequential command encoders and when synchronization is not required because of Metal's architecture.
In the question linked above, Apple's documentation is quoted as saying:
Memory Barriers
Between Command Encoders
All resource writes performed in a given command encoder are visible in the next command encoder. This is true for both render and compute command encoders.
I interpret this as implying that an MTLRenderCommandEncoder
does not need explicit synchronization with a previous MTLBlitCommandEncoder
if they are both in the same command buffer and occur one after the other.
However, Apple's own sample code appears to contradict that. In Image Filter Graph with Heaps and Fences, it is shown that an MTLFence
is required to synchronize access to a texture that is being used first in an MTLBitCommandEncoder
followed by two sequential MTLComputeCommandEncoder
calls. (One for the horizontal blur and a second for the vertical blur.)
See:
AAPLFilter.m (L:199)
AAPLRenderer.m (L:413)
These command encoders are being executed within the same command buffer. Why does the first MTLComputeCommandEncoder
need to explicitly wait for the blit
to finish and why does the second compute encoder need to wait for the first compute encoder if, as quoted above, "All resource writes performed in a given command encoder are visible in the next command encoder."?
Pseudo Sample Code:
- (void)drawInMTKView:(nonnull MTKView *)view {
id <MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
id<MTLTexture> masterTexture = self.masterTexture;
id<MTLTexture> incomingTexture = [self dequeueRenderedTextureIfPresent];
id<MTLBlitCommandEncoder> blitEncoder = commandBuffer.blitCommandEncoder;
[blitEncoder copyFromTexture:incomingTexture ... toTexture:masterTexture];
[blitEncoder endEncoding];
id <MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor];
// Is synchronization with the blit encoder required here?
//
// The fragment shader is going to sample from masterTexture and will
// expect that the blit command above will have been completed.
[renderEncoder setFragmentTexture:masterTexture atIndex:0];
[renderEncoder drawPrimitives:...];
[commandBuffer commit];
}
In the pseudo code above, does the render command encoder have to explicitly wait for the blit command encoder to finish? In the answer to my previous question, I'm lead to believe that the answer is "No". But looking at Apple's Sample Code for using fences and events, I'm lead to believe that the answer is "Yes".
If synchronization is not required, then what is different between this pseudo-code and Apple's sample code?
Edit #1:
Thanks to Ken's answer below, I quickly found a relevant thread on Apple's Developer Forum that covers this exact issue.
Apple Developer Forum: MTLFence detailed behaviour?
As Ken correctly points out, the key detail to understand is the difference between a tracked texture and an untracked texture.
This manual synchronization is only needed for resources that Metal doesn't automatically track. Resources allocated from MTLHeap
s are not automatically tracked. Resources explicitly created with the MTLResourceHazardTrackingModeUntracked
option are also untracked.
From the overview of the Image Filter Graph with Heaps and Fences sample you linked, under Optimize Resource Allocation and Performance:
When resources are allocated from a device, Metal creates and tracks additional state to ensure that the resource memory is allocated, synchronized, and made available throughout the lifetime of any command buffer that needs the given resource. It does so even if the resource itself is destroyed before the command buffer begins execution.
Although Metal also carries out this process for heaps, it doesn’t do so for resources within the heap. Instead, the app must perform explicit fine-grained synchronization when it creates objects from the heap and reuses memory.
The pseudo-code in your question would only need explicit synchronization if the resources are untracked.