I've implemented occlusion culling via a compute shader in conjunction with indirect rendering in hlsl on DX12.
I would like to get back a count of the number of objects that have been culled to the CPU for output to console.
The code I have has mostly been achieved by looking at existing examples, and I'm not really aware of the best methods for doing what I guess is a reduction. I've seen things like InterlockedAdd but don't know if that's the route to take either..
My current code looks like this (details of culling omitted):
SamplerState DepthSampler : register(s0);
StructuredBuffer<IndirectCommand> inputCommands : register(t0); // SRV: Indirect commands
StructuredBuffer<VSIndirectConstants> indirectConstants : register(t1); // SRV: of per-object constants
StructuredBuffer<TransformData> TransformBuffer : register(t2); // SRV: transforms (per object)
Texture2D<float> DepthTexture : register(t3);
AppendStructuredBuffer<IndirectCommand> outputCommands : register(u0); // UAV: Processed indirect commands
bool isOccluded(uint index)
{
bool occluded = false;
uint transformIndex = indirectConstants[index].transformIndex;
TransformData tData = TransformBuffer[transformIndex];
VSIndirectConstants constants = indirectConstants[index];
...
}
[numthreads(threadBlockSize, 1, 1)]
void main(uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
{
// Each thread of the CS operates on one of the indirect commands.
uint index = (groupId.x * threadBlockSize) + groupIndex;
// Don't attempt to access commands that don't exist if more threads are allocated
// than commands.
if (index < (uint)commandCount)
{
if (isWithinFrustum(index) && !isOccluded(index))
{
outputCommands.Append(inputCommands[index]);
}
}
}
I'd just like to increment a counter somewhere if I've culled an object and be able to read it back from the CPU efficiently. Would appreciate suggestions on the approach to take, shouldn't need too much detail code-wise.
Some rough sketch would be like this:
R32_UINT
and you create it's UAV
.InterlockedAdd
as you mentioned.CopyBufferRegion
to copy between your original buffer and readback buffer.uint32
.This is without going into too much of API details.
The approach you mentioned in the comments with UAV
counter is also possible but it doesn't make your life much easier. You still need to create the buffer with R32_UINT
format (which will be your counter buffer), only this time when you create UAV
of your outputCommands
with CreateUnorderedAccessView
the pCounterResource
argument won't be nullptr
. You will save yourself writing InterlockedAdd
code though. However the other part of the procedure will be the same if you want to read it on CPU
.