Search code examples
openglglslgpu

GLSL fragment processing order for full screen quad


I'm using GLSL for some image processing stuff, so drawing a full screen quad and doing processing in the fragment shader. I'm wondering if we can expect fragments to be processed in any particular priority order?

I know the fragments are being processed in parallel and we can't make any guarantees on the finish time for any particular fragment, so how is this handled? Is it just a big queue? And what would the pattern look like ie. scanline, blocks etc.

Will this be driver dependent?


Solution

  • There is no documentation on it because it is handled arbitrarily. Hardware is afforded the ability to process fragments in a completely arbitrary order; you are not allowed to know about the order of fragment processing in any way, shape, or form. There are no controls to change the order of fragment processing, affect that order, or even detect it at all.

    Well, until 4.2 and ARB_shader_image_load_store. But even that has controls built into it to allow the hardware as much freedom as possible.

    In short, if you're doing something where the order of processing matters, you're doing something wrong.

    It sounds to me like you're trying to do a feedback loop, where you read from and write to the framebuffer simultaneously (by binding a texture and attaching that same image to an FBO render target). That is not allowed.


    OK, so this is about performance, not functionality.

    You can assume what you normally would for CPUs: that memory accesses will be cached. The order that the fragment shaders go in doesn't matter; one of them will hit the memory first, and the one that hits it later benefits from the cache.

    Remember: GPUs are optimized for doing this stuff. GPUs sell based on how fast textured, shader processed triangles are rendered. Implementations know how textures are going to be used and you can expect that it will not be stupid with how it orders the fragments it outputs.

    If you have to random access, then you have to random access; there's not much you can do about it. But otherwise, this isn't something you should be wasting any time worrying about or trying to optimize around. Let the hardware and driver writers do their jobs.