When is it safe to write over and reuse a MTLBuffer or other Metal vertex buffer?

I'm just getting started with Metal, and am having trouble grasping some basic things. I've been reading a whole bunch of web pages about Metal, and working through Apple's examples, and so forth, but gaps in my understanding remain. I think my key point of confusion is: what is the right way to handle vertex buffers, and how do I know when it's safe to reuse them? This confusion manifests in several ways, as I'll describe below, and maybe those different manifestations of my confusion need to be addressed in different ways.

To be more specific, I'm using an MTKView subclass in Objective-C on macOS to display very simple 2D shapes: an overall frame for the view with a background color inside, 0+ rectangular subframes inside that overall frame with a different background color inside them, and then 0+ flat-shaded squares of various colors inside each subframe. My vertex function is just a simple coordinate transformation, and my fragment function just passes through the color it receives, based on Apple's triangle demo app. I have this working fine for a single subframe with a single square. So far so good.

There are several things that puzzle me.

One: I could design my code to render the whole thing with a single vertex buffer and a single call to drawPrimitives:, drawing all of the (sub)frames and squares in one big bang. This is not optimal, though, as it breaks the encapsulation of my code, in which each subframe represents the state of one object (the thing that contains the 0+ squares); I'd like to allow each object to be responsible for drawing its own contents. It would be nice, therefore, to have each object set up a vertex buffer and make its own drawPrimitives: call. But since the objects will draw sequentially (this is a single-threaded app), I'd like to reuse the same vertex buffer across all of these drawing operations, rather than having each object have to allocate and own a separate vertex buffer. But can I do that? After I call drawPrimitives:, I guess the contents of the vertex buffer have to be copied over to the GPU, and I assume (?) that is not done synchronously, so it wouldn't be safe to immediately start modifying the vertex buffer for the next object's drawing. So: how do I know when Metal is done with the buffer and I can start modifying it again?

Two: Even if #1 has a well-defined answer, such that I could block until Metal is done with the buffer and then start modifying it for the next drawPrimitives: call, is that a reasonable design? I guess it would mean that my CPU thread would be repeatedly blocking to wait for the memory transfers, which is not great. So does that pretty much push me to a design where each object has its own vertex buffer?

Three: OK, suppose each object has its own vertex buffer, or I do one "big bang" render of the whole thing with a single big vertex buffer (this question applies to both designs, I think). After I call presentDrawable: and then commit on my command buffer, my app will go off and do a little work, and then will try to update the display, so my drawing code now executes again. I'd like to reuse the vertex buffers I allocated before, overwriting the data in them to do the new, updated display. But again: how do I know when that is safe? As I understand it, the fact that commit returned to my code doesn't mean Metal is done copying my vertex buffers to the GPU yet, and in the general case I have to assume that could take an arbitrarily long time, so it might not be done yet when I re-enter my drawing code. What's the right way to tell? And again: should I just block waiting until they are available (however I'm supposed to do that), or should I have a second set of vertex buffers that I can use in case Metal is still busy with the first set? (That seems like it just pushes the problem down the pike, since when my drawing code is entered for the third update both previously used sets of buffers might not yet be available, right? So then I could add a third set of vertex buffers, but then the fourth update...)

Four: For drawing the frame and subframes, I'd like to just write a reuseable "drawFrame" type of function that everybody can call, but I'm a bit puzzled as to the right design. With OpenGL this was easy:

- (void)drawViewFrameInBounds:(NSRect)bounds
{
    int ox = (int)bounds.origin.x, oy = (int)bounds.origin.y;

    glColor3f(0.77f, 0.77f, 0.77f);
    glRecti(ox, oy, ox + 1, oy + (int)bounds.size.height);
    glRecti(ox + 1, oy, ox + (int)bounds.size.width - 1, oy + 1);
    glRecti(ox + (int)bounds.size.width - 1, oy, ox + (int)bounds.size.width, oy + (int)bounds.size.height);
    glRecti(ox + 1, oy + (int)bounds.size.height - 1, ox + (int)bounds.size.width - 1, oy + (int)bounds.size.height);
}

But with Metal I'm not sure what a good design is. I guess the function can't just have its own little vertex buffer declared as a local static array, into which it throws vertices and then calls drawPrimitives:, because if it gets called twice in a row Metal might not yet have copied the vertex data from the first call when the second call wants to modify the buffer. I obviously don't want to have to allocate a new vertex buffer every time the function gets called. I could have the caller pass in a vertex buffer for the function to use, but that just pushes the problem out a level; how should the caller handle this situation, then? Maybe I could have the function append new vertices onto the end of a growing list of vertices in a buffer provided by the caller; but this seems to either force the whole render to be completely pre-planned (so that I can preallocate a big buffer of the right size to fit all of the vertices everybody will draw – which requires the top-level drawing code to somehow know how many vertices every object will end up generating, which violates encapsulation), or to do a design where I have an expanding vertex buffer that gets realloc'ed as needed when its capacity proves insufficient. I know how to do these things; but none of them feels right. I'm struggling with what the right design is, because I don't really understand Metal's memory model well enough, I think. Any advice? Apologies for the very long multi-part question, but I think all of this goes to the same basic lack of understanding.

Solution

The short answer to you underlying question is: you should not overwrite resources that are used by commands added to a command buffer until that command buffer has completed. The best way to determine that is to add a completion handler. You could also poll the status property of the command buffer, but that's not as good.

First, until you commit the command buffer, nothing is copied to the GPU. Further, as you noted, even after you commit the command buffer, you can't assume the data has been fully copied to the GPU.

Second, you should, in the simple case, put all drawing for a frame into a single command buffer. Creating and committing a lot of command buffers (like one for every object that draws) adds overhead.

These two points combined means you can't typically reuse a resource during the same frame. Basically, you're going to have to double- or triple-buffer to get correctness and good performance simultaneously.

A typical technique is to create a small pool of buffers guarded by a semaphore. The semaphore count is initially the number of buffers in the pool. Code which wants a buffer waits on the semaphore and, when that succeeds, take a buffer out of the pool. It should also add a completion handler to the command buffer that puts the buffer back in the pool and signals the semaphore.

You could use a dynamic pool of buffers. If code wants a buffer and the pool is empty, it creates a buffer instead of blocking. Then, when it's done, it adds the buffer to the pool, effectively increasing the size of the pool. However, there's typically no point in doing that. You would only need more than three buffers if the CPU is running way ahead of the GPU, and there's no real benefit to that.

As to your desire to have each object draw itself, that can certainly be done. I'd use a large vertex buffer along with some metadata about how much of it has been used so far. Each object that needs to draw will append its vertex data to the buffer and encode its drawing commands referencing that vertex data. You would use the vertexStart parameter to have the drawing command reference the right place in the vertex buffer.

You should also consider indexed drawing with the primitive restart value so there's only a single draw command which draws all of the primitives. Each object would add its primitive to the shared vertex data and index buffers and then some high level controller would do the draw.