Compute Shader uniform updates within a loop

I have an OpenGL Compute Shader that has workgroups dispatched in each iteration of a loop. There is a unique uniform value, representing an ID, that will need to be passed. Each ID is unique to the set of shader invocations that are generated from each dispatch call.

Is it possible to keep the value unique within each shader invocation set simply by re-assigning a value using a mapped pointer to a UBO within the loop? From testing, it looks like only one possible value can be passed-in to all shader invocation sets within a single frame. Please correct me, if I'm wrong.

Are there other ways to pass unique values to entire work group sets without sacrificing performance? If not, what are the means to solve this if performance wasn't a concern?

For more context, I'm attempting to implement something similar to the loop found in the link below, using OpenGL instead of DirectX:

https://github.com/GPUOpen-LibrariesAndSDKs/GPUParticles11/blob/master/gpuparticles11/src/GPUParticleSystem.cpp#L1163

In the example above, there is a map and unmap operation prior to updating a Constant Buffer. Perhaps this will need to be done with OpenGL instead of using a persistent map? Or could I be missing flags?

Solution

Is it possible to keep the value unique within each shader invocation set simply by re-assigning a value using a mapped pointer to a UBO within the loop?

Yes, if you synchronize the modification of that memory with OpenGL. Persistent mapped memory means that synchronization between host changes and the GPU are now your responsibility.

In order to do what you suggest, you would need to effectively issue a glFinish call after each loop iteration, so that the CPU would not attempt to modify that memory until the GPU is finished reading from it.

This is obviously a bad idea, so don't do that. Doing a bunch of map/unmap calls between each dispatch is a performance killer too. Odds are good that at some point, it will have to do the same thing as issuing a glFinish in each iteration. Even if that doesn't happen, the implementation will have to do a lot of allocation work behind the scenes to make it performance-friendly.

For a simple numeric identifier, just use a glUniform call per-iteration.