Search code examples
c++c++11memorygame-engineheap-memory

How to avoid heap allocation inserting Rendercommands to a RenderCommandBuffer?


I have a RenderQueue that sorts a list of elements to render. Now that RenderQueue creates a RenderCommandBuffer with all the "low level" rendering operations, the problem is that the performance goes from 1400 FPS to 40FPS for 1000 elements

I profiled the app and the problem lies here (those per frame allocations):

        std::for_each(element.meshCommands.begin(), element.meshCommands.end(), [&] (auto &command) {
            std::vector<std::pair<std::string, glm::mat4> > p{ { "MVP", VPmatrix * command.second} };
            m_commandBuffer.addCommand(std::make_shared<SetShaderValuesCommand>(element.material,p));
            m_commandBuffer.addCommand(std::make_shared<BindMaterialCommand>(element.material));
            m_commandBuffer.addCommand(std::make_shared<RenderMeshCommand>(meshProperty.mesh)); 
        });

I know that I can group my meshes by material, but the problem is more or less the same. Allocation of many objects per frame. How will you avoid this situation? How the game engines deal with this problem ? Memory pools?


Solution

  • Details are scant, but I see two opportunities for tuning.

    m_commandBuffer is a polymorphic container of some kind. I completely understand why you would build it this way, but it presents a problem - each element must be separately allocated.

    You may well get much better performance by amalgamating all render operations into a variant, and implement m_commandBuffer as a vector (or queue) of such variants. This allows you to reserve() space for the 1000 commands with 1 memory allocation, rather than the (at least) 1000 you currently require.

    It also means that you only incur the cost of one memory fence during the allocation, again rather than the thousands you are suffering while incrementing and decrementing all the reference counts in all those shared_ptrs.

    So:

    using Command = boost::variant< SetShaderValuesCommand, BindMaterialCommand, RenderMeshCommand>;
    
    using CommandQueue = std::deque<Command>;
    

    Executing the commands then becomes:

    for (auto& cmd : m_commandBuffer) {
      boost::apply_visitor([](auto& actualCmd) { 
        actualCmd.run(); /* or whatever is the interface */
      }, cmd);
    }