Search code examples
c++openglcachingglm-math

Cache-friendly vertex definition?


I am writing an OpenGL application and for vertices, normals, and colors, I am using separate buffers as follows:

GLuint vertex_buffer, normal_buffer, color_buffer;

My supervisor tells me that if I define an struct like:

struct vertex {
    glm::vec3 pos;
    glm::vec3 normal;
    glm::vec3 color;
};
GLuint vertex_buffer;

and then define a buffer of these vertices, my application will gets so much faster because when the position is cached the normals and colors will be in cache line.

What I think is that defining such struct is not having that much affect on the performance because defining the vertex like the struct will cause less vertices in the cacheline while defining them as separate buffers, will cause to have 3 different cache lines for positions, normals and colors in the cache. So, nothing has been changed. Is that true?


Solution

  • First of all, using separate buffers for different vertex attributes may not be a good technique.

    Very important factor here is GPU architecture. Most (especially modern) GPUs have multiple cache lines (data for Input Assembler stage, uniforms, textures), but fetching input attributes from multiple VBOs can be inefficient anyway (always profile!). Defining them in interleaved format can help improve performance:

    enter image description here

    And that's what you would get, if you used such struct.

    However, that's not always true (again, always profile!) - although interleaved data is more GPU-friendly, it needs to be properly aligned and can take significantly more space in memory.

    But, in general:

    Interleaved data formats:

    • Cause less GPU cache pressure, because the vertex coordinate and attributes of a single vertex aren't scattered all over in memory. They fit consecutively into few cache lines, whereas scattered attributes could cause more cache updates and therefore evictions. The worst case scenario could be one (attribute) element per cache line at a time because of distant memory locations, while vertices get pulled in a non-deterministic/non-contiguous manner, where possibly no prediction and prefetching kicks in. GPUs are very similar to CPUs in this matter.

    • Are also very useful for various external formats, which satisfy the deprecated interleaved formats, where datasets of compatible data sources can be read straight into mapped GPU memory. I ended up re-implementing these interleaved formats with the current API for exactly those reasons.

    • Should be layouted alignment friendly just like simple arrays. Mixing various data types with different size/alignment requirements may need padding to be GPU and CPU friendly. This is the only downside I know of, appart from the more difficult implementation.

    • Do not prevent you from pointing to single attrib arrays in them for sharing.

    Source

    Further reads:

    Best Practices for Working with Vertex Data

    Vertex Specification Best Practices