Search code examples
hlslcompute-shaderdirectx-12

how to return an object count from compute shader


I've implemented occlusion culling via a compute shader in conjunction with indirect rendering in hlsl on DX12.

I would like to get back a count of the number of objects that have been culled to the CPU for output to console.

The code I have has mostly been achieved by looking at existing examples, and I'm not really aware of the best methods for doing what I guess is a reduction. I've seen things like InterlockedAdd but don't know if that's the route to take either..

My current code looks like this (details of culling omitted):

SamplerState DepthSampler                                  : register(s0);
StructuredBuffer<IndirectCommand> inputCommands            : register(t0);      // SRV: Indirect commands
StructuredBuffer<VSIndirectConstants> indirectConstants    : register(t1);      // SRV: of per-object constants
StructuredBuffer<TransformData> TransformBuffer            : register(t2);      // SRV: transforms (per object)
Texture2D<float> DepthTexture                              : register(t3);
AppendStructuredBuffer<IndirectCommand> outputCommands     : register(u0);      // UAV: Processed indirect commands

bool isOccluded(uint index)
{
    bool occluded = false;
    uint transformIndex = indirectConstants[index].transformIndex;
    TransformData tData = TransformBuffer[transformIndex];
    VSIndirectConstants constants = indirectConstants[index];
    ...
}

[numthreads(threadBlockSize, 1, 1)]
void main(uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
{
    // Each thread of the CS operates on one of the indirect commands.
    uint index = (groupId.x * threadBlockSize) + groupIndex;

    // Don't attempt to access commands that don't exist if more threads are allocated
    // than commands.
    if (index < (uint)commandCount)
    {
            if (isWithinFrustum(index) && !isOccluded(index))
            {
                outputCommands.Append(inputCommands[index]);
            }                    
    }
}

I'd just like to increment a counter somewhere if I've culled an object and be able to read it back from the CPU efficiently. Would appreciate suggestions on the approach to take, shouldn't need too much detail code-wise.


Solution

  • Some rough sketch would be like this:

    1. You create a buffer with format R32_UINT and you create it's UAV.
    2. In shader you increment it using InterlockedAdd as you mentioned.
    3. You also create readback buffer with the same format. You will need to create them as much as backbuffers you are using (Or just create one buffer large enough to contain all "sub-buffers"
    4. You then use CopyBufferRegion to copy between your original buffer and readback buffer.
    5. At the end, you map the readback buffer if you havent already and then you read the uint32.

    This is without going into too much of API details.

    The approach you mentioned in the comments with UAV counter is also possible but it doesn't make your life much easier. You still need to create the buffer with R32_UINT format (which will be your counter buffer), only this time when you create UAV of your outputCommands with CreateUnorderedAccessView the pCounterResource argument won't be nullptr. You will save yourself writing InterlockedAdd code though. However the other part of the procedure will be the same if you want to read it on CPU.