Search code examples
image-processingkernelswift4metalmetalkit

Passing parameters in Metal Compute Kernel using Swift 4


On the CPU side, I have a structure I want to pass to the compute kernel:

  private struct BoundingBoxParameters {
    var x: Float = 0
    var y: Float = 0
    var width: Float = 0
    var height: Float = 0
    var levelOfDetail: Float = 1.0
    var dummy: Float = 1.0  // Needed for success
  }

Before running the kernel, I pass in the data to the MTLComputeCommandEncoder:

Option 1 (directly):

commandEncoder!.setBytes(&params, length: MemoryLayout<BoundingBoxParameters>.size, index: 0)

Option 2 (indirectly via MTLBuffer):

boundingBoxBuffer.contents().copyBytes(from: &params, count: MemoryLayout<BoundingBoxParameters>.size)
commandEncoder!.setBuffer(boundingBoxBuffer, offset: 0, index: 0)

Either option works fine if the 'dummy' variable exists in the structure, but fails if the 'dummy' variable does not exists. The code fails in the call to:

commandEncoder!.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)

With the error:

validateComputeFunctionArguments:820: failed assertion `Compute Function(resizeImage): argument params[0] from buffer(0) with offset(0) and length(20) has space for 20 bytes, but argument has a length(24).'

On the Metal Kernel Side, here's the relevant code snippets:

struct BoundingBoxParameters {
  float2 topLeft;
  float2 size;
  float levelOfDetail;
};

kernel void resizeImage(constant BoundingBoxParameters *params [[buffer(0)]],
                        texture2d<half, access::sample> sourceTexture [[texture(0)]],
                        texture2d<half, access::write> destTexture [[texture(1)]],
                        sampler samp [[sampler(0)]],
                        uint2 gridPosition [[thread_position_in_grid]]) {
  float2 destSize = float2(destTexture.get_width(0), destTexture.get_height(0));
  float2 sourceCoords = float2(gridPosition) / destSize;
  sourceCoords *= params->size;
  sourceCoords += params->topLeft;
  float lod = params->levelOfDetail;
  half4 color = sourceTexture.sample(samp, sourceCoords, level(lod));
  destTexture.write(color, gridPosition);
}

I also get a similar issue when trying to pass in a 3x3 matrix to another compute kernel. It complains that 36 bytes are provided but expecting 48.

Anyone got any ideas on the issue?


Solution

  • First-off, I want to point out that you should not be using size when you need to get an actual length of a Swift type laid out in memory. You should use stride for that. According to Swift's Type Layout:

    The final size and alignment are the size and alignment of the aggregate. The stride of the type is the final size rounded up to alignment.

    This answer goes into detail about memory layout in Swift if you want to get a better understanding of the topic.


    The problem is that your Metal struct that uses float2 and a Swift struct that replaces it with two separate Float fields have different memory layouts.

    Size (stride in case of Swift) of the struct needs to be a multiple of the largest alignment of any struct member. The largest alignment in your Metal struct is 8 bytes (alignment of float2) so there's a padding at the tail of struct after the float value.

    struct BoundingBoxParameters {
        float2 topLeft; // 8 bytes
        float2 size; // 8 bytes
        float levelOfDetail; // 4 bytes
        // 4 bytes of padding so that size of struct is multiple 
        // of the largest alignment (which is 8 bytes)
    
    }; // 24 bytes in total
    

    So your Metal struct does in fact end up taking up 24 bytes as the error suggests.

    At the same time, your Swift struct, having the largest alignment of 4 bytes, only needs 20 bytes.

    private struct BoundingBoxParameters {
        var x: Float = 0 // 4 bytes
        var y: Float = 0 // 4 bytes
        var width: Float = 0 // 4 bytes
        var height: Float = 0 // 4 bytes
        var levelOfDetail: Float = 1.0 // 4 bytes
        // no need for any padding 
    
    } // 20 bytes in total
    

    That's why they end up incompatible with each other and dummy field compensating 4 missing bytes to Swift struct.

    To resolve this I suggest you use float2 from simd in Swift instead of Floats:

    import simd 
    
    private struct BoundingBoxParameters {
        var topLeft = float2(x: 0, y: 0)
        var size = float2(x: 0, y: 0)
        var levelOfDetail: Float = 1.0 
    }
    

    Don't forget to use MemoryLayout<BoundingBoxParameters>.stride (24 bytes) to get the length instead of size (20 bytes).


    Same goes for the 3x3 matrix case: Metal's float3x3 has size of 48 bytes and alignment of 16 bytes. As I assume, you've created a Swift struct with 9 Floats which would have stride/size of 36 bytes and alignment of 4 bytes. Hence, the mis-alignment. Use matrix_float3x3 from simd.

    In general, for any cases when you use vectors or matrices in Metal you should use corresponding simd types in Swift.