Search code examples
c++openglglslglm-math

OpenGL: Batch Renderer: Should Transformations Take place on the CPU or GPU?


I am developing a 2D game engine that will support 3D in the future. In this current phase of development, I am working on the batch renderer. As some of you may know, when batching graphics together, uniform support for color (RGBA), texture coordinates, texture ID (texture index), and model transformation matrix go out the window, but instead are passed through the vertex buffer. Right now, I have implemented passing the model's positions, color, texture coordinates, and the texture ID to the vertex buffer. My vertex buffer format looks like this right now:

float* v0 = {x, y, r, g, b, a, u, v, textureID};
float* v1 = {x, y, r, g, b, a, u, v, textureID};
float* v2 = {x, y, r, g, b, a, u, v, textureID};
float* v3 = {x, y, r, g, b, a, u, v, textureID};

I am about to integrate calculating where the object should be in world space using a transformation matrix. This leads me to ask the question:

Should the transformation matrix be multiplied by the model vertex positions on the CPU or GPU?

Something to keep in mind is that if I pass it to the vertex buffer, I would have to upload the transformation matrix once per vertex (4 times per sprite) which to me seems like a waste of memory. On the other hand, multiplying the model vertex positions by the transformation matrix on the CPU seems like it would be slower compared with the GPU's concurrency capabilities.

This is how my vertex buffer format would look like if I calculate the transform on the GPU:

float* v0 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};
float* v1 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};
float* v2 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};
float* v3 = {x, y, r, g, b, a, u, v, textureID, m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12, m13, m14, m15};

The question is mostly theoretically driven. So, a theoretical and technical answer would be much appreciated. But for reference, here is the code.


Solution

  • Should Transformations Take place on the CPU or GPU?

    It really depends on the situation at hand. If you resubmit your vertices every frame, it's best to benchmark what's best for your case. If you want to animate without resubmitting all your vertices, you don't have a choice but to apply it on the GPU.

    Whatever the reason, if you decide to apply the transformations on the GPU, there are better ways of doing that other than duplicating the matrix for each vertex. I'd instead put the transformation matrices in an SSBO:

    layout(std430, binding=0) buffer Models {
        mat4 MV[]; // model-view matrices
    };
    

    and store a single index in each vertex in the VAO:

    struct Vert {
        float x, y, r, g, b, a, u, v;
        int textureID, model;
    };
    

    The vertex shader can go and fetch the full matrix based on the index attribute:

    layout(location = 0) in vec4 in_pos;
    layout(location = 1) in int in_model;
    void main() {
        gl_Position = MV[in_model] * in_pos;
    }
    

    You can even combine it with other per-object attributes, like the textureID.

    EDIT: you can achieve something similar with instancing and multi-draw. Though it's likely to be slower.