In WWDC session 510, Apple engineers present support for coding CIKernel
in Metal
and claim it should work faster.
I've made together a test project which implements motion blur in both metal
and glsl
(code is similar to the one from 510 session).
Sometimes metal kernel
is faster, sometimes glsl kernel
is faster, but I definitely can't see metal kernel
perform consistency and significantly better across the board. Is it supposed to be like this, am I missing out something?
Note: the project won't run on simulator, you'd need A8+ powered device.
Looks like at some of this is hardware-related. Here's my iPad Pro 10.5 inch results:
glsl 1 took 229.572057723999ms
glsl 2 took 49.1310358047485ms
glsl 3 took 46.7269420623779ms
glsl 4 took 53.08997631073ms
glsl 5 took 48.9979982376099ms
glsl 6 took 49.0390062332153ms
glsl 7 took 52.5139570236206ms
glsl 8 took 46.4930534362793ms
glsl 9 took 39.6310091018677ms
glsl 10 took 45.9860563278198ms
metal 1 took 77.7549743652344ms
metal 2 took 44.1800355911255ms
metal 3 took 46.0859537124634ms
metal 4 took 45.3709363937378ms
metal 5 took 43.5279607772827ms
metal 6 took 38.9848947525024ms
metal 7 took 37.1809005737305ms
metal 8 took 37.8340482711792ms
metal 9 took 37.6850366592407ms
metal 10 took 37.5720262527466ms
And my iPhoneSE results:
glsl 1 took 394.147992134094ms
glsl 2 took 94.601035118103ms
glsl 3 took 81.4379453659058ms
glsl 4 took 76.9931077957153ms
glsl 5 took 77.0320892333984ms
glsl 6 took 75.8579969406128ms
glsl 7 took 76.9950151443481ms
glsl 8 took 77.8199434280396ms
glsl 9 took 79.7009468078613ms
glsl 10 took 79.4800519943237ms
metal 1 took 146.992921829224ms
metal 2 took 88.6669158935547ms
metal 3 took 81.8150043487549ms
metal 4 took 78.1329870223999ms
metal 5 took 79.5910358428955ms
metal 6 took 93.6589241027832ms
metal 7 took 94.8940515518188ms
metal 8 took 89.0530347824097ms
metal 9 took 84.3830108642578ms
metal 10 took 77.949047088623ms
A question and a thought: