I'm developing a VR application in Unity that uses a native plugin for video decoding and I wanted to do some processing on the decoded video frame.
My first step was to use a Unity compute shader that was dispatched from a C# script within the Unity application. This worked and I was seeing the expected results but I had a synchronisation issue with pulling a parameter out of the native plugin running on the render thread that needed to be fed into the compute shader running on the main thread.
I thought this could be solved by converting the Unity compute shader into a D3D11 compute shader and processing the decoded frame in the native plugin as soon as it pops out of the decoder. This also gave me the expected results but at a huge cost to performance. The application drops frames and when using RenderDoc to profile a single frame I'm seeing around 32ms for the compute dispatch call in the plugin compared to 3ms when using the Unity compute shader.
I cannot find any information on why there is such a discrepancy between then two. I have tried simplifying the D3D11 shader to simply writing out zeros and the profiler is still showing around 32ms which makes me think this is something to do with my setup of the shader in the plugin.
I have included some code to show my setup and execution of the plugin compute shader.
Compute shader in native C++ plugin:
void process()
{
ID3D11DeviceContext* ctx = NULL;
device->GetImmediateContext(&ctx);
ctx->UpdateSubresource(_pCB, 0, nullptr, &_bufferStruct, 0, 0);
if (!_resourcesSet) {
// Set read texture
ID3D11ShaderResourceView * inY = nullptr;
ID3D11ShaderResourceView * inU = nullptr;
ID3D11ShaderResourceView * inV = nullptr;
_inputTexture->getSRVs(&inY, &inU, &inV);
// Set write texture
ID3D11UnorderedAccessView * outY;
ID3D11UnorderedAccessView * outU;
ID3D11UnorderedAccessView * outV;
_outputTexture->getUAVs(&outY, &outU, &outV);
ctx->CSSetConstantBuffers(0, 1, &_pCB);
ctx->CSSetShaderResources(0, 1, &inY);
ctx->CSSetShaderResources(1, 1, &inU);
ctx->CSSetShaderResources(2, 1, &inV);
ctx->CSSetUnorderedAccessViews(0, 1, &outY, nullptr);
ctx->CSSetUnorderedAccessViews(1, 1, &outU, nullptr);
ctx->CSSetUnorderedAccessViews(2, 1, &outV, nullptr);
ctx->CSSetShader(_computeShader, NULL, 0);
_resourcesSet = true;
}
ctx->Dispatch(outputWidth / 8, outputHeight / 8, 1);
ctx->Release();
}
The simplified compute shader itself:
SamplerState TextureSampler
{
Filter = MIN_MAG_MIP_LINEAR;
AddressU = Wrap;
AddressV = Wrap;
};
Texture2D<float> inY : register(t0);
Texture2D<float> inU : register(t1);
Texture2D<float> inV : register(t2);
RWTexture2D<float> outY : register(u0);
RWTexture2D<float> outU : register(u1);
RWTexture2D<float> outV : register(u2);
[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
float3 col = float3(0.0, 0.0, 0.0);
outY[id.xy] = col.r;
outU[id.xy / 2] = col.g;
outV[id.xy / 2] = col.b;
}
Is there anything obvious I am missing, or is unity just very good at optimisation?
I managed to fix this issue by making a couple of changes in different places.
Firstly I changed the shader to write the output to a single texture object:
RWTexture2D<float4> unpackedRGBA : register(u0);
Then I managed to create a texture that I could write to in the shader and also pass to Unity, meaning I did not need to make a texture copy, I think this was the real key to speeding up the process:
D3D11_TEXTURE2D_DESC texDesc;
texDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = 0;
texDesc.MiscFlags = 0;
The important part here was the combination of bind flags meaning the texture can be written to in the shader by binding the UAV pointer but also handed to Unity using the SRV pointer.
In Unity I then created a texture using the SRV pointer:
IntPtr nativeTexturePtr = new IntPtr();
nativeGetOutputTexture(ref nativeTexturePtr);
output = Texture2D.CreateExternalTexture(videoWidth, videoHeight, TextureFormat.RGBA32, false, false, nativeTexturePtr);
This resulted in comparable render times to my initial implementation using a Unity compute shader but I kept seeing black. The final fix was to unbind the output texture after dispatching the D3D11 compute shader meaning it was free to be bound in Unity when it needed to be rendered into the scene.
ctx->CSSetUnorderedAccessViews(0, 1, &gEmptyUav, nullptr);