Search code examples
javaopenglnativelwjgl

LWJGL: Buffer Memory Management


I'm looking for some memory/performance advice on which approach is better.

If lets say i have 4 attributes for an mesh

Vertex    3f
Normal    3f
TexCoords 2f
jointID   4i [Integer Joint Indices For Skeleton Animation]

And i need these in cpu memory as they can be modified anytime

Is it better to

a.Create 4 seperate buffers for each component

//3,2,4 are the strides i.e vertex is 3 floats,texCoord is 2 floats so on
FloatBuffer vertices=BufferUtils.createFloatBuffer(numOfVertices*3);
FloatBuffer normals=BufferUtils.createFloatBuffer(numOfVertices*3);
FloatBuffer texCoords=BufferUtils.createFloatBuffer(numOfVertices*2);
IntBuffer   vertexJoints=BufferUtils.createIntBuffer(numOfVertices*4);

Or

b.Create a large bytebuffer with enough capacity to store all 4 attributes and create seperate Float/Int Buffer Views for each of the attributes

 ByteBuffer  meshData=BufferUtils.createByteBuffer(((numOfVertices*3)+(numOfVertices*3)+(numOfVertices*2)+(numOfVertices*4))*4); //*4 because both float/int is 4 bytes
 FloatBuffer vertices=meshData.position(0).limit(endVertexByte).asFloatBuffer();
 FloatBuffer normals=meshData.position(endVertexByte).limit(endNormalByte).asFloatBuffer();
 FloatBuffer texCoords=meshData.position(endNormalByte).limit(endTexCoordByte).asFloatBuffer();
 IntBuffer   jointIDs=meshData.position(endTexCoordByte).limit(endJointIndexByte or end of buffer in this case).asIntBuffer();

From the docs all of BufferUtils methods create's an directBuffer which is stored in native memory, and all though the 2nd approach create an buffer larger[since we multiply by 4] than all the individual attribute buffers combined , It creates only one large native memory chunk compared to 4 seperate memory areas in the first approach.

But that's just my opinion, thoughts?


Solution

  • There will be no performance difference when we just look at how you write (new) data into those buffers from the CPU's perspective. In either case you have just four consecutive memory regions that you tap onto when you update the vertices' attribute data. Just that in the former case those memory regions are offset by an unknown amount of bytes (because the JVM's memory allocator will allocate each region separately), while in the latter case you know the offset between each two consecutive memory regions, because you allocated those in a single JVM buffer memory allocation.

    However, what will make difference is in how you actually map those client-side host-memory regions to server-side OpenGL buffer object memory. I suppose that once you updated the host-side memory you will actually upload that into server-side OpenGL buffer objects and not use client/host-side memory pointers for OpenGL vertex specification commands (which is only available in OpenGL compatibility context).

    In that case, creating four separate contiguous client-side memory regions will necessitate in you having to do four OpenGL buffer memory upload commands (glBufferSubData()) and the OpenGL driver to do four distinct Direct Memory Access (DMA) upload over the PCIe. In case where you only have one contiguous client-side memory region, you can issue just a single glBufferSubData() call for all vertex attributes' data into a single buffer object, where you just use byte-offsets in the OpenGL vertex specification calls (e.g. for glVertexAttribPointer()).

    Another possibility is also to not allocate the client-side host memory yourself, but have host-visible, persistently mapped buffer regions provided to you by OpenGL (glBufferStorage() + glMapBufferRange()), which you can then write to and explicitly flush or let them implicitly/coherently update by the OpenGL driver. Like the four individual client-side memory regions, you will also likely pay the "four distinct DMA tranfers" cost when you map and flush four distinct OpenGL buffer object regions.

    So, in the end, it's not so relevant whether you have one or four NIO Buffer views on your client-side memory, but to how many server-side OpenGL buffer objects you map those memory regions - with the fewer the better.