Search code examples
unity-game-engineshaderdirectx-11hlslcompute-shader

Read groupshared variables back to cpu memory


First of all, is it even possible, reading groupshared data? Or is groupshared data required to be copied to some RWbuffer before transfering it to cpu memory? Since RWbuffers can't be groupshared (I'm assuming it's because you don't know the size of the buffer at compile time).

For those interested, this is the error it throws when declaring a groupshared buffer: Shader error in 'FOWComputeShader': 'Result': groupshared variables cannot hold resources at kernel CSMain at ...

Basically I'm declaring a big groupshared uint array in the shader, worth 16kb. I'm linking a computebuffer in the main code to this groupshared array. Dispatching the shader, then reading back from the buffer. Sadly the data I read back is all 0.

I'm working in a unity environment with a compute shader, setting my buffer up like this:

// MapSize is 128 * 128, so 16kb
// sizeof(uint) is the stride size
// ComputeBufferType.Raw, because I intend to use each uint as 4 bytes later on, so I don't want funny stuff to happen to the values
ComputeBuffer FOWMapBuffer = new ComputeBuffer(MapSize,  sizeof(uint), ComputeBufferType.Raw);
FOWComputeShader.SetBuffer(kernel, "_FoWMap", FOWMapBuffer);

//just the dispatch
int ThreadCount = Mathf.CeilToInt((float)FOWdata.Count / ThreadGroupSizeX);
FOWComputeShader.Dispatch(kernel, ThreadCount, 1, 1);

//outVisibleToFaction is a byte array of 128 * 128 size
FOWMapBuffer.GetData(outVisibleToFaction);
FOWMapBuffer.Dispose();

Then inside the shader:

// 4096 uints * 4 bytes per uint = 16kb
#define FoWMap_Size 4096
groupshared uint _FoWMap[FoWMap_Size];

[numthreads(32,1,1)]
void CSMain(uint3 id : SV_DispatchThreadID)
{
    for (uint i = 0; i < FoWMap_Size; i++)
    {
        _FoWMap[i] = i;
    }
}

That's my environment. Does anyone know if reading back groupshared data is possible, if so then why is my buffer reading back all 0s?


Solution

  • No, you can't access groupshared memory on the CPU directly. Groupshared memory is a block of on-chip memory, and is the name suggests, it's only shared between the threads inside a single group, so there isn't even one single groupshared memory, but rather multiple instances (which may or may not co-exist, depending on hardware and shader). The lifetime of each block of groupshared memory ends once the thread group that it belongs to finished executing (which allows the hardware to re-use that memory for the next thread group). In your case, for example, you're actually dispatching ThreadCount groups, so there will be that many logical blocks of 16 kb groupshared memory, each 16 kb in size.

    So, as a summary, groupshared memory is more like a temporary cache that you can use so the threads inside your thread group can communicate with each other. Nothing outside of these 32 threads in your thread group knows about content or even existance of that memory (since it only really exists while these threads are currently executing).

    If anything outside of these 32 threads needs to have access to the memory, you will need to write it out to an RW Buffer.