Metal kernel shader -- fade implementation

I haven't written many Metal kernel shaders yet; here's a fledgling "fade" shader between two RGBX-32 images, using a tween value of 0.0 to 1.0 between inBuffer1 (0.0) to inBuffer2 (1.0).

Is there something I'm missing here? Something strikes me that this may be terribly inefficient.

My first inkling is to attempt to do subtraction and multiplication using the vector data types (eg. char4) thinking that might be better, but the results of this are certainly undefined (as some components will be negative).

Also, is there some advantage to using MTLTexture versus MTLBuffer objects as I've done?

kernel void fade_Kernel(device const uchar4  *inBuffer1  [[ buffer(0) ]],
                        device const uchar4  *inBuffer2  [[ buffer(1) ]],
                        device const float   *tween      [[ buffer(2) ]],
                        device uchar4        *outBuffer  [[ buffer(3) ]],
                        uint gid [[ thread_position_in_grid ]])
{
    const float t = tween[0];
    uchar4 pixel1 = inBuffer1[gid];
    uchar4 pixel2 = inBuffer2[gid];

    // these values will be negative
    short r=(pixel2.r-pixel1.r)*t;  
    short g=(pixel2.g-pixel1.g)*t;
    short b=(pixel2.b-pixel1.b)*t;

    outBuffer[gid]=uchar4(pixel1.r+r,pixel1.g+g,pixel1.b+b,0xff);
}

Solution

First, you should probably declare the tween parameter as:

constant float &tween [[ buffer(2) ]],

Using the constant address space is more appropriate for a value like this that's the same for all invocations of the function (and not indexed into by grid position or the like). Also, making it a reference instead of a pointer tells the compiler that you won't be indexing other elements in the "array" that a pointer might be.

Finally, there's a mix() function that performs exactly the sort of computation that you're doing here. So, you could replace the body of the function with:

uchar4 pixel1 = inBuffer1[gid];
uchar4 pixel2 = inBuffer2[gid];

outBuffer[gid] = uchar4(uchar3(mix(float3(pixel1.rgb), float3(pixel2.rgb), tween)), 0xff);

As to whether it would be better to use textures, that depends somewhat on what you plan to do with the result after running this kernel. If you're going to be doing texture-like things with it anyway, it might be better to use textures all throughout. Indeed, it might be better to use drawing operations with blending rather than a compute kernel. After all, such blending is something GPUs have to do all the time, so that path is probably fast. You'd have to test the performance of each approach.