Search code examples
kernelmacos-sierrametal

Metal Shading Language - buffer binding


I want to have particles that will increase at time. I got advice that set the buffer value higher so that I can play around with the amount of the particles. what I am thinking is I will have a maximum count size set to the buffer, then in shader, I will have a struct with array to take the particles attribute.

I have this in my swift:

var vectMaxCount = 10
var metalvects = [float3(0.0,0.0,0.0),float3(1.0,0.0,0.0),float3(2.0,0.0,0.0)]
var vectBuffer: MTLBuffer!

Then I register the buffer:

vectBuffer  = device!.makeBuffer(length: MemoryLayout<float3>.size * vectMaxCount, options: [])

and update the buffer accordingly:

...
command_encoder.setBuffer(vectBuffer, offset: 0, at: 2)
var bufferPointer = vectBuffer.contents()
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.size * vectMaxCount)

let threadGroupCount = MTLSizeMake(8, 8, 1)
let threadGroups = MTLSizeMake(drawable.texture.width / threadGroupCount.width, drawable.texture.height / threadGroupCount.height, 1)
command_encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
command_encoder.endEncoding()
command_buffer.present(drawable)
command_buffer.commit()

and try to get it from metal file:

struct Vects
{
    float3 position[100];
};

kernel void compute(texture2d<float, access::write> output [[texture(0)]],
                    constant Vects &vects [[buffer(2)]],
                    uint2 gid [[thread_position_in_grid]]) {
...
}

and I got an error:

validateComputeFunctionArguments:727: failed assertion `(length - offset)(160) must be >= 1600 at buffer binding at index 2 for vects[0].'

It is indicated the line command_encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount) give me error. I read a bit about buffer binding and I think is the way I send the threadGroupCounts or ThreadGroup that give me issue.

If I change float3 position[100]; to float3 position[7];, it still work. Anything more then 7 will get the similar error.

How can I resolve this?

And is there a good formula to estimate threadGroups and threadGroupCount? Even rule of thumb to do that?

Update01

Based on Ken Thomases's answer, I change my code to:

swift:

vectBuffer  = device!.makeBuffer(length: MemoryLayout<float3>.stride * metalvects.count, options: [])
...
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.stride * metalvects.count)
...

metal:

struct Vects
{
    float3 position[3];
};
...

It does work for now. But how can I allocate higher buffer memory that yet to be used later in the app like this post mentioned?


Solution

  • There are multiple issues here.

    You are defining Vects with a specific size. That allows Metal to check if the size of the buffer at index 2 is big enough to match the size of your vects variable. It is complaining because it isn't big enough. (It wouldn't be able to do this check if vects were declared as constant float3 *vects [[buffer(2)]], for example.)

    Second, the size of your buffer — MemoryLayout<float3>.size * vectMaxCount — is incorrect. It fails to take into account the alignment of float3 and therefore the padding that exists between elements in your [float3] array. As noted in the documentation for MemoryLayout, you should always use stride, not size, when calculating allocation sizes.

    This is why the failure happens when Vects::position is 8 or more elements long. You would expect it to start at 11 elements because vectMaxCount is 10, but your buffer is shorter than an array of vectMaxCount float3s. To be specific, your buffer is 10 * 12 == 120 bytes long. The stride of float3 is 16 and 120 / 16 == 7.5.

    If you switch from size to stride when allocating your buffer and change the element count of Vects::position to 10 to match vectMaxCount, then you'll get past this immediate issue. However, there are additional problems lurking.

    Your compute function as it currently stands doesn't know how many elements of vects.position are actually filled. You need to pass in the actual count of elements.

    This line:

    memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.size * vectMaxCount)
    

    is incorrect (even after replacing size with stride). It reads past the end of metalvects. That's because the number of elements in metalvects is less than vectMaxCount. You should use metalvects.count instead of vectMaxCount.