Implementing blending functions too complicated for fixed-function blending

I'm trying to implement advanced blending on the gpu. But whenever I try searching for resources, I get overwhelmed by terms like Order Independent Rendering, fragment shader interlock. And I don't know which ones are relevant to my problem.

Put simply given 3 objects, (b0, b1, b2) , which all lie on the same pixel and have a certain z-order. (You can assume the objects are correctly sorted ).

Then the final color of set pixel should be

f( b2, f( b1, f( b0 , background_color ) ) )

where f is an arbitrary function (decided at compile time) from vec4, vec4 -> vec4 which cannot be achieved using fixed function blending.

My instinct would be to use code which looks like

void main() {
  vec4 previous_color = //...
  gl_FragColor = f( object_color, previous_color );
}

However from my limited research and understanding, you cannot simply read already rendered data inside a fragment shader. (Hence the reason for VK_EXT_fragment_shader_interlock, I would assume ).

However if I can't rely on such extensions, Can I still implement this on a GPU.

Also an explanation of why the "naïve" implementation would cause issues would be appreciated.

Solution

Linked-list-based order independent transparency gets you this for free, by virtue of the fact that the algorithm doesn't use fixed-function blending at all. When resolving blending, the fragment shader (or compute shader, but the FS is fine) gets a list of colors that affect a particular fragment. And the FS must use these colors (and their distances) to compute the blending result. So all of the information is right there.

Of course, OIT isn't exactly cheap. If ordering your scene is difficult, it can be worth the cost (especially if computing the order involves a lot of CPU interaction), but it isn't free.

OIT relies on nothing more than the ability to have SSBOs of arbitrary size and atomic increments. These represent base functionality for Vulkan.

OIT aside, what you're ultimately doing is having an FS perform an ordered, atomic read/modify/write of an image, where the "modify" operation is non-trivial (ie: not an increment or whatever).

Unextended Vulkan can do this through input attachments and pipeline barriers.

You have to have a subpass that uses an attachment as both a color attachment and an input attachment. Then, you render one non-overlapping object (that's very important). The previous_color in your FS now comes from a subpassLoad operation. This function is given no texture coordinates, as it always reads from the texel corresponding to the current fragment location. That is, it reads the color already in the framebuffer underneath this fragment. The output goes to the color attachment, which is the same image as the input attachment.

However, you can only do this once per-pixel. That is, each pixel can only have a single read/modify/write. To get a second one, you must issue a pipeline barrier between the drawing of one object and the drawing of the next object. This means that each set of objects that overlap must be broken up into their own chunks within the subpass. The pipeline barrier itself is special, as it must use a subpass self-dependency (also specified in the render pass definition).

For efficient rendering, you need to break your draw calls up into chunks of drawing where there are no overlapping pixels within each chunk.

Fragment shader interlock is another mechanism, but it does not work on the fragment shader's outputs. It's only useful if you want to do atomic RMWs to memory via image load/store or SSBOs. So you would have to render the main scene, end the render pass, start a new render pass that doesn't use that image as an attachment, then do your interlocked-image load/stores to that. And then end that render pass for any other post-processing you may need.