Search code examples
directx-11direct3ddirect3d11

How to best organize constant buffers


I'm having some trouble wrapping my head around how to organize the constant buffers in a very basic D3D11 engine I'm making.

My main question is: Where does the biggest performance hit take place? When using Map/Unmap to update buffer data or when binding the cbuffers themselves?

At the moment, I'm deciding between the following two implementations for a sort of "shader-wrapper" class:

Holding an array of 14 ID3D11Buffer*s

class VertexShader
{
...

public:
    Bind(context)
    {
        // Bind all 14 buffers at once
        context->VSSetConstantBuffers(0, 14, &m_ppCBuffers[0]);
        context->VSSetShader(pVS, nullptr, 0);
    }

    // Set the data for a buffer in a particular slot
    SetData(slot, size, pData)
    {
        D3D11_MAPPED_SUBRESOURCE mappedBuffer = {};
        context->Map(buffers[slot], 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedBuffer);
        memcpy(mappedBuffer.pData, pData, size);
        context->Unmap(buffers[slot], 0);
    }


private:
    ID3D11Buffer*       buffers[14];
    ID3D11VertexShader* pVS;
}

This approach would have the shader bind all the cbuffers in a single batch of 14. If the shader has cbuffers registered to b0, b1, b3 the array would look like -> [cb|cb|0|cb|0|0|0|0|0|0|0|0|0|0]

Constant Buffer wrapper that knows how to bind itself

class VertexShader
{
...

public:
    Bind(context)
    {
        // all the buffers bind themselves
        for(auto cb : bufferMap)
            cb->Bind(context);

        context->VSSetShader(pVS, nullptr, 0);
    }

    // Set the data for a buffer with a particular ID
    SetData(std::string, size, pData)
    {
        // table lookup into bufferMap, then Map/Unmap
    }


private:
    std::unordered_map<std::string, ConstantBuffer*> bufferMap;
    ID3D11VertexShader* pVS;
}

This approach would hold "ConstantBuffers" in a hash table, each one would know what slot it's bound to and how to bind itself to the pipeline. I would have to call VSSetConstantBuffers() individually for each cbuffer since the ID3D11Buffer*s wouldn't be contiguous anymore, but the organization is friendlier and has a bit less wasted space.

How would you typically organize the relationship between CBuffers, Shaders, SRVs, etc? Not looking for a do-all solution, but some general advice and things to read more about from people hopefully more experienced than I am

Also if @Chuck Walbourn sees this, I'm a fan of your work and using DXTK/WiCTextureLoader for this project!

Thanks.


Solution

  • Constant Buffers were a major feature of Direct3D 10, so one of the best talks on the subject was given way back at Gamefest 2007:

    Windows to Reality: Getting the Most out of Direct3D 10 Graphics in Your Games

    See also Why Can Updating Constant Buffers be so painfully slow? (NVIDIA)

    The original intention was for CBs to be organized by frequency of update: something like one CB for stuff that is set 'per level', another for stuff 'per frame', another for 'per object', another 'per pass' etc. Therefore the assumption is that if you changed any part of a CB, you were going to be uploading the whole thing. Bandwdith between the CPU and GPU is the real bottleneck here.

    For this approach to be effective, you basically need to set up all your shaders to use the same scheme. This can be difficult to manage, especially when so many modern material systems are art-driven.

    Another approach to CBs is to use them like a dynamic VB for particles submission where you fill it up with short-lived constants, submit work, and then reset the thing each frame. This approach is basically what people do for DirectX 12 in many cases. The problem is that without the ability to update parts of CBs, it's too slow. The "partial constant buffer updates and offsets' optional features in DirectX 11.1 were a way to make this work. That said, this feature is not supported on Windows 7 and is 'optional' on newer versions of Windows, so you have to support two codepaths to use it.

    TL;DR: you can technically have a lot of CBs bound at once, but the key thing is to keep the individual size small for the ones that change often. Also assume any change to a CB is going to require updating the whole thing to the GPU every time you do change it.