I'm confused about the MTLTexture array for CoreML's custom layers. In my mlmodel, the input MTLTexture of the custom layer has 32 channels and the output has 8 channels. The data type of MTLTexture is 16-bit floats, or half. So the input texture_array consists of 8 slices and the output consists of 2 slices.
func encode(commandBuffer: MTLCommandBuffer, inputs: [MTLTexture], outputs: [MTLTexture]) throws {
print(#function, inputs.count, outputs.count)
if let encoder = commandBuffer.makeComputeCommandEncoder() {
for i in 0..<inputs.count {
encoder.setTexture(inputs[i], index: 0)
encoder.setTexture(outputs[i], index: 1)
encoder.dispatch(pipeline: psPipeline, texture: inputs[i])
encoder.endEncoding()
}
}
}
In my compute kernel function
kernel void pixelshuffle(
texture2d_array<half, access::read> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
ushort3 gid [[thread_position_in_grid]])
{
if (gid.x >= inTexture.get_width() || gid.y >= inTexture.get_height()
|| gid.z>=inTexture.get_array_size()){
return;
}
const half4 src = half4(inTexture.read(gid.xy, gid.z));
//do other things
}
)
If the input and output texture arrays are [C][H][W],for gid=(0,0,0), which channels are src.rgba stored in, and what are the rgba coordinates in its channels?
is src.r [0][0][0], src.g[1][0][0], src.b [2][0][0], src.a [3][0][0] ? or is src.r [0][0][0], src.g[0][0][1], src.b [0][0][2], src.a [0][0][3] ?
And how can i get raw data for input texture in encode function and print it out ?
In your compute kernel, src
contains the RGBA values of a single pixel in the texture, and each value is a 16-bit float.
The texture's width corresponds to W, the texture's height is H, and the textures slices are C where each slice has 4 channels.
So the number of slices in the texture is equal to C/4
, and gid.z
goes from 0 to floor((C + 3)/4)
.
(Although that also depends on what your encoder.dispatch(pipeline:, texture:)
function does, since this does not appear to be a standard method on MTLComputeCommandEncoder
.)
That means src.r
is the first channel in the slice, .g
is the second channel in the slice, .b
is the third channel, and .a
the fourth channel in the slice. The first slice has channels 0-3, the second has channels 4-7, and so on.
So your first guess is the correct one:
src.r [0][0][0], src.g[1][0][0], src.b [2][0][0], src.a [3][0][0]
Also note that I wrote a blog post about custom kernels in Core ML that might be useful: http://machinethink.net/blog/coreml-custom-layers/