OpenGL atomic counters buffer aliasing performance

If I use an atomic counter in say shader 'A' for a rendering/compute dispatch, but then alias that counter to a buffer in shader 'B' for a following rendering/compute dispatch, but declare it in this second dispatch as a uniform or SSBO instead of an actual atomic counter is there any performance implications I should be aware of? (Assuming appropriate glMemoryBarriers() taken etc)

I know some AMD hardware at least has limited dedicated atomic hardware units. I guess in this case the counter results get written to the buffer later if aliased into a SSBO? So possibly it may be better in this case to keep the atomic counter never aliased to anything that isn't declared explicitly as an atomic counter?

I don't have access to a wide amount of relevant hardware to test myself what is best before anyone asks, so I wondered if there was a general rule of thumb on this?

Solution

Performance is not your problem here. Your problem here is whether it will work at all.

By the rules of OpenGL's memory model, if you perform some operation that does atomic counter manipulation, and you read from that buffer after that rendering operation, then you are required to be able to get the result of all of your atomic counter manipulations. That represents synchronous execution.

That's all true... unless you try to read from it as an atomic counter. Because then, the execution model becomes... less well-defined. Implementations are not required to synchronize atomic counter accesses across rendering calls.

Now, the nature of atomic counter operations doesn't change because of this. For example, you are still guaranteed to get unique values across rendering calls if you only ever increment or decrement a counter. And if you modify the counter's buffer storage, then execute a counter operation, the next operation will see the modifications (due to synchronous operations).

But if the only operations that modify this memory are atomic counters, then using an atomic counter read is not guaranteed to see atomic counter operations from prior rendering commands.

So yes, accessing the atomic counter's variable will be faster than using an SSBO or UBO or whatever. But you won't get the right answer. So it's not a good trade-off ;)