Search code examples
objective-coptimizationmetal

Why is currentRenderPassDescriptor taking 8ms in my Metal draw routine?


I have a simple draw routine to draw a rotating triangle. And have diagnosed that most of the time getting the current MTLRenderPassDescriptor from the MTKView is taking 8ms. My code is in Objective-C and using manual reference counting. This is so I can communicate with my C++ game loop.

I'm calling this draw routine in an external while loop in C++ and there are no wait functions, so the loop restarts as soon as the last one is finished. At the end of my C++ loop, I manually poll all Cocoa events similar to GLFW.

Here is my code for the draw routine:

- (void)commitWithMTKView:(nonnull MTKView *)mtk_view CommandQueue:(nonnull id<MTLCommandQueue>)command_queue {
    NSLog(@"Begin Frame");
    [self updateState];
    NSLog(@"Updated State");
    
    @autoreleasepool {
        NSLog(@"Probe 1");
        MTLRenderPassDescriptor* render_pass_descriptor = mtk_view.currentRenderPassDescriptor;
        NSLog(@"Probe 2");
        if (render_pass_descriptor != nil) {
            id<MTLCommandBuffer> command_buffer = [command_queue commandBuffer];
            
//            Command Encoding
            id<MTLRenderCommandEncoder> render_encoder = [command_buffer renderCommandEncoderWithDescriptor:render_pass_descriptor];
            [render_encoder setRenderPipelineState:_render_pipeline_state];
//            
            [render_encoder setVertexBuffer:_vertex_buffers[_current_buffer]
                                     offset:0
                                    atIndex:ParallelTriangleRotate_vertexIndexVertices];
            
            [render_encoder setVertexBytes:&_aspect_ratio
                                    length:sizeof(_aspect_ratio)
                                   atIndex:ParallelTriangleRotate_vertexIndexAspectRatio];
            
            [render_encoder drawPrimitives:MTLPrimitiveTypeTriangle
                               vertexStart:0
                               vertexCount:3];
//            
            [render_encoder endEncoding];
            NSLog(@"Finished Encoded");
//
//            Commit in Drawable
            [command_buffer presentDrawable:mtk_view.currentDrawable];
            NSLog(@"Before Commit");
            [command_buffer commit];
            NSLog(@"After Commit");
            [command_buffer waitUntilScheduled];
            NSLog(@"Just Scheduled");
            [command_buffer waitUntilCompleted];
            NSLog(@"Completed Scheduled");
        }
    }
    NSLog(@"End Frame");
}

By visual inspection I found two bottlenecks. The primary one is between "Probe 1" and "Probe 2", which is getting the currentRenderPassDescriptor from the MTKView. We can see this in the time stamps of the log below.

2023-12-29 16:38:29.733 debug[64019:15574119] Begin Frame
2023-12-29 16:38:29.733 debug[64019:15574119] Updated State
2023-12-29 16:38:29.733 debug[64019:15574119] Probe 1
2023-12-29 16:38:29.741 debug[64019:15574119] Probe 2
2023-12-29 16:38:29.741 debug[64019:15574119] Finished Encoded
2023-12-29 16:38:29.741 debug[64019:15574119] Before Commit
2023-12-29 16:38:29.741 debug[64019:15574119] After Commit
2023-12-29 16:38:29.741 debug[64019:15574119] Just Scheduled
2023-12-29 16:38:29.742 debug[64019:15574119] Completed Scheduled
2023-12-29 16:38:29.742 debug[64019:15574119] End Frame

And the other is very rare and between "After Commit" and "Just Scheduled". This one makes more sense.

2023-12-29 16:45:50.024 debug[64199:15579939] Updated State
2023-12-29 16:45:50.024 debug[64199:15579939] Probe 1
2023-12-29 16:45:50.024 debug[64199:15579939] Probe 2
2023-12-29 16:45:50.024 debug[64199:15579939] Finished Encoded
2023-12-29 16:45:50.024 debug[64199:15579939] Before Commit
2023-12-29 16:45:50.024 debug[64199:15579939] After Commit
2023-12-29 16:45:50.024 debug[64199:15579939] Just Scheduled
2023-12-29 16:45:50.032 debug[64199:15579939] Completed Scheduled
2023-12-29 16:45:50.032 debug[64199:15579939] End Frame

This case happened extremely rarely and when it did, currentRenderPassDescriptor ran instantly.

Lastly I am setting (what I'm pretty sure is v-sync) to be turned off with mtk_view.enableSetNeedsDisplay = NO;

So after this rudimentary diagnosis, it looks like the stall is primarily coming from a simple pointer retrieval with the line MTLRenderPassDescriptor* render_pass_descriptor = mtk_view.currentRenderPassDescriptor;

Why is this? Can I optimize this? Should I store the pointer as a constant and update it when the view render pass descriptor changes?

EDIT:

I have also found that despite disabling mtk_view.enableSetNeedsDisplay, I am still getting v-sync behavior. I am manually measuring the frames per second in my C++ loop, and see that I am getting ~140 frames when the window is on my MacBook Retina display and ~70 frames when the window is on my Samsung display. This means that despite calling the draw function every loop in the unrestrained C++ loop, Metal is still stalling to maintain the frame rate. And from my diagnostic, it looks like it is stalling at MTLRenderPassDescriptor* render_pass_descriptor = mtk_view.currentRenderPassDescriptor; because this is taking 8ms and each frame of the C++ loop is taking 8ms. (This is 15ms when on the Samsung display).

EDIT 2:

For some reason when I set mtk_view.preferredFramesPerSecond = 10; when initializing the MTKView, I am getting extremely fast frames. But when I increase the frame rate to say mtk_view.preferredFramesPerSecond = 60; I get frames that still last to be 8ms. This seems to be inverted from what I expect.


Solution

  • currentRenderPassDescriptor of the MTKView asks its internal CAMetalLayer for the nextDrawable, which blocks waiting for the next drawable to be available.

    enableSetNeedsDisplay does not have anything to do with V-Sync and more so to do with when and how MTKView calls its delegate.

    The actual setting you are after is displaySyncEnabled. Keep in mind that it's only available on macOS.

    The reason you are seeing currentRenderPassDescriptor take 8 ms is because you are probably either running on an internal display on a MacBook Pro, which has a ProMotion display, which runs at up to 120 Hz or you are running on an external monitor which is also capable of 120 Hz. 120 Hz is equivalent to a screen being redrawn approximately every 8 ms.

    The reason you get "very fast frames" is because of how you are measuring it. Your frames are fast but they run not very often, because you are asking for 10 Hz updates and the compositor can basically give you all of the drawables instantly because the interval between you asking for drawables is more than the interval between the screen being drawn, so you always have at least two drawables, which means you will always get a new drawable instantly.

    Keep in mind that some behavior might change when running Direct or Composited, so enable Metal HUD to see which mode you are running in.