Why do we still use Fixed function blending operations in D3D11 ect?

I was looking aground trying to understand why we are still using fixed function blending modes in newer 3D API's (like D3D11). In D3D10 fixed function Alpha Clipping was removed in favor of using the shaders. Why because its a much more powerful approach to almost any situation.

So why then can we not calculate or own blending operations (aka texture sample from the RenderTarget we are currently rendering into)?? Is there some hardware design issue in the video card pipelines that make this difficult to accomplish?

The reason this would be useful, is because you could do things like make refraction shaders run way faster as you wouldn't have to swap back and forth between two renderTargets for each refractive object overlay. Such as a refractive windowing system for an OS or game UI.

Where might be the best place to suggest an idea like this as this is not a discussion forum as I would love to see this in D3D12? Or is this already possible in D3D11?

Solution

So why then can we not calculate or own blending operations

Who says you can't? With shader_image_load_store (and the D3D11 equivalent), you can do pretty much anything you want with images. Provided that you follow the rules. That last part is generally what trips people up. Doing a full read/modify/write in a shader, such that later fragment shader invocations don't read the wrong value is almost impossible in the most general case. You have to restrict it by saying that each rendered object will not overlap with itself, and you have to insert a memory barrier between rendered objects (which can overlap with other rendered objects). Or you use the linked list approach.

But the point is this: with these mechanisms, not only have people implemented blending in shaders, but they've implemented order-independent transparency (via linked lists). Nothing is stopping you from doing what you want right now.

Well, nothing except performance of course. The fixed-function blender will always be faster because it can run in parallel with the fragment shader operations. The blending units are separate hardware from the fragment shaders, so you can be doing blending operations while simultaneously doing fragment shader ops (obviously from later fragments, not the ones being blended).

The read/modify/write mechanism in the blend hardware is designed specifically for blending, while the image_load_store is a more generic mechanism. And while generic may beat specific in the long-term of hardware evolution, for the immediate and near-future, you can expect fixed-function blending to beat image_load_store blending performance-wise every time.

You should use it only when you must. And even the, decide if you really, really need it.