Search code examples
metal

Metal sampling texture from argument buffer generates GPU Address Fault


I have a fragment shader function that samples a cube map texture (texturecube). It was working fine, until I decided to pass this texture, with other arguments, through an argument buffer:

#ifdef __METAL_VERSION__
struct SkyParams {
    vector_float3 ambientRadiance;
    texturecube <float,access::sample> cubeMap;
    texture2d <float,access::sample> lutMap;
};
#endif

My shader function looks like this (I have simplified it as much as possible):

float4 enlight(PixelData pixel, constant SkyParams &sky) {
    constexpr sampler sam(mip_filter::linear, mag_filter::linear, min_filter::linear, address::repeat);
    float3 irradiance = sky.cubeMap.sample(sam, pixel.normal, level(4)).rgb;
    float3 diffuse = pixel.albedo * mix(pow(irradiance, 0.2), irradiance, pixel.metalness);
    float3 ambient = diffuse * pixel.ao * sky.ambientRadiance;
    return float4(ambient, 1.0);
}

It generates GPU errors: Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

What is weird is that if I just change the return value, without even commenting the texture sampling, the error disappears:

float4 enlight(PixelData pixel, constant SkyParams &sky) {
    constexpr sampler sam(mip_filter::linear, mag_filter::linear, min_filter::linear, address::repeat);
    float3 irradiance = sky.cubeMap.sample(sam, pixel.normal, level(4)).rgb;
    float3 diffuse = pixel.albedo * mix(pow(irradiance, 0.2), irradiance, pixel.metalness);
    float3 ambient = diffuse * pixel.ao * sky.ambientRadiance;
    return float4(pixel.albedo, 1.0);
}

How can I debug this? Any idea?

Note that my argument buffer seems to be correctly encoded: enter image description here


Solution

  • kIOGPUCommandBufferCallbackErrorPageFault means that the GPU page faulted when trying to read or write something, because it couldn't find a mapping from virtual memory to physical memory. A resource that has this mapping is said to be "resident".

    This usually happens when you are missing a useResource or useHeap call in the corresponding command encoder that does the reads (or writes). And this goes for textures, as well as other resources. Every resource occupies some range of virtual memory, that needs a GPU mapping, which is "created" when your command encoder with useResource call is submitted in a command buffer.

    Residency is usually not a problem when you are not using argument buffers, because the API can see explicitly which resources you are using and it can make them resident implicitly for you. But with argument buffers, it doesn't really know which resources exactly are you using, so the API needs you to tell it.

    It can also happen if your resource is actually present, but you are overrunning it's length dramatically, usually more than the length of the virtual memory page. For example, if you have a buffer that is 1024 bytes long, but you are reading from it at an offset that is greater than 16 384, you might go into other page that is not mapped and get a page fault.