I want to have particles that will increase at time. I got advice that set the buffer value higher so that I can play around with the amount of the particles. what I am thinking is I will have a maximum count size set to the buffer, then in shader
, I will have a struct
with array to take the particles attribute.
I have this in my swift
:
var vectMaxCount = 10
var metalvects = [float3(0.0,0.0,0.0),float3(1.0,0.0,0.0),float3(2.0,0.0,0.0)]
var vectBuffer: MTLBuffer!
Then I register the buffer
:
vectBuffer = device!.makeBuffer(length: MemoryLayout<float3>.size * vectMaxCount, options: [])
and update the buffer
accordingly:
...
command_encoder.setBuffer(vectBuffer, offset: 0, at: 2)
var bufferPointer = vectBuffer.contents()
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.size * vectMaxCount)
let threadGroupCount = MTLSizeMake(8, 8, 1)
let threadGroups = MTLSizeMake(drawable.texture.width / threadGroupCount.width, drawable.texture.height / threadGroupCount.height, 1)
command_encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
command_encoder.endEncoding()
command_buffer.present(drawable)
command_buffer.commit()
and try to get it from metal
file:
struct Vects
{
float3 position[100];
};
kernel void compute(texture2d<float, access::write> output [[texture(0)]],
constant Vects &vects [[buffer(2)]],
uint2 gid [[thread_position_in_grid]]) {
...
}
and I got an error:
validateComputeFunctionArguments:727: failed assertion `(length - offset)(160) must be >= 1600 at buffer binding at index 2 for vects[0].'
It is indicated the line command_encoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
give me error. I read a bit about buffer binding
and I think is the way I send the threadGroupCounts
or ThreadGroup
that give me issue.
If I change float3 position[100];
to float3 position[7];
, it still work. Anything more then 7 will get the similar error.
How can I resolve this?
And is there a good formula to estimate threadGroups
and threadGroupCount
? Even rule of thumb to do that?
Update01
Based on Ken Thomases's answer, I change my code to:
swift:
vectBuffer = device!.makeBuffer(length: MemoryLayout<float3>.stride * metalvects.count, options: [])
...
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.stride * metalvects.count)
...
metal:
struct Vects
{
float3 position[3];
};
...
It does work for now. But how can I allocate higher buffer memory that yet to be used later in the app like this post mentioned?
There are multiple issues here.
You are defining Vects
with a specific size. That allows Metal to check if the size of the buffer at index 2 is big enough to match the size of your vects
variable. It is complaining because it isn't big enough. (It wouldn't be able to do this check if vects
were declared as constant float3 *vects [[buffer(2)]]
, for example.)
Second, the size of your buffer — MemoryLayout<float3>.size * vectMaxCount
— is incorrect. It fails to take into account the alignment of float3
and therefore the padding that exists between elements in your [float3]
array. As noted in the documentation for MemoryLayout
, you should always use stride
, not size
, when calculating allocation sizes.
This is why the failure happens when Vects::position
is 8 or more elements long. You would expect it to start at 11 elements because vectMaxCount
is 10, but your buffer is shorter than an array of vectMaxCount
float3
s. To be specific, your buffer is 10 * 12 == 120 bytes long. The stride of float3
is 16 and 120 / 16 == 7.5.
If you switch from size
to stride
when allocating your buffer and change the element count of Vects::position
to 10 to match vectMaxCount
, then you'll get past this immediate issue. However, there are additional problems lurking.
Your compute function as it currently stands doesn't know how many elements of vects.position
are actually filled. You need to pass in the actual count of elements.
This line:
memcpy(bufferPointer, &metalvects, MemoryLayout<float3>.size * vectMaxCount)
is incorrect (even after replacing size
with stride
). It reads past the end of metalvects
. That's because the number of elements in metalvects
is less than vectMaxCount
. You should use metalvects.count
instead of vectMaxCount
.