Search code examples
gpumetal

Right edge of Metal Texture has anomalies


When I run this code on an integrated Intel GPU on a Macbook Pro, I have no problems. But when I run it on an iMac with an AMD GPU, this simple "Hello World" gives me artifacts along the right edge:

enter image description here

The shader is very simple:

kernel void helloworld(texture2d<float, access::write> outTexture [[texture(0)]],
                     uint2 gid [[thread_position_in_grid]])
{
    outTexture.write(float4((float)gid.x/640,
                            (float)gid.y/360,0,1),
                     gid);
}

I've tried viewing the texture's contents in two different ways, and both are producing the problems:

Converting the texture to a CIImage and viewing it in an NSImageView, or calling getBytes and copying the pixel data directly and manually building a PNG out of it (skipping CIImage entirely). Either way produces this weird artifact, so it is indeed in the texture itself.

Any ideas what causes this kind of problem?

UPDATE:

Fascinating, the issue appears to be related to threadsPerThreadgroup but I'm not sure why it would be.

The above image was created with 24 threads per group. If I change this to 16, the artifacts move to the bottom edge instead.

What I don't understand about this is the gid position should be fixed regardless of what threadgroup is actually running, shouldn't it? Because that is the individual threads position in the whole image.


Solution

  • With dispatchThreadgroups(), the compute kernel can be invoked for grid positions outside of your width*height grid. You have to explicitly do nothing with something like:

    if (gid.x >= 640 || gid.y >= 360)
        return;
    

    Otherwise, you will attempt to write outside of the bounds of the texture (with colors some of whose components are larger than 1). That has undefined results.

    With dispatchThreads(), Metal takes care of this for you and won't invoke outside of your specified grid size.

    The difference in behavior between 24 and 16 threads per group is whether that divides evenly into 640 and 360. Whichever doesn't divide evenly is the dimension which gets over-invoked.