Why the Metal triple buffering model matters in official examples?

Metal Best Practices suggest using the triple buffering for dynamic data buffers. But the listing provided in the documentation and the default Metal example generated by the Xcode are blocking every frame waiting for GPU to finish its work:

- (void)render
{
    // Wait until the inflight command buffer has completed its work
    dispatch_semaphore_wait(_frameBoundarySemaphore, DISPATCH_TIME_FOREVER);

   // TODO: Update dynamic buffers and send them to the GPU here !

   __weak dispatch_semaphore_t semaphore = _frameBoundarySemaphore;
    [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> commandBuffer) {
        // GPU work is complete
        // Signal the semaphore to start the CPU work
        dispatch_semaphore_signal(semaphore);
    }];

    // CPU work is complete
    // Commit the command buffer and start the GPU work
    [commandBuffer commit];

}

So how does the triple buffering improves anything here?

Solution

The important bit you didn't spot in the sample is:

_frameBoundarySemaphore = dispatch_semaphore_create(kMaxInflightBuffers);

As the documentation for dispatch_semaphore_create says:

Passing a value greater than zero is useful for managing a finite pool of resources, where the pool size is equal to the value.

kMaxInflightBuffers is set to 3 for triple buffering. The first 3 calls to dispatch_semaphore_wait will succeed without any waiting.