Possible Rendering Performance Optimizations

I was doing some benchmarking today using C# and OpenTK, just to see how much I could actually render before the framerate dropped. The numbers I got were pretty astronomical, and I am quite happy with the outcome of my tests.

In my project I am loading the blender monkey, which is 968 triangles. I then instance it and render it 100 times. This means that I am rendering 96,800 triangles per frame. This number far exceeds anything that I would need to render during any given scene in my game. And after this I pushed it even further and rendered 2000 monkeys at varying locations. I was now rendering a whopping 1,936,000 (almost 2 million triangles per frame) and the framerate was still locked at 60 frames per second! That number just blew my mind. I pushed it even further and finally the framerate started to drop, but this just means that the limit is roughly 4 million triangles per frame with instancing.

I was just wondering though, because I am using some legacy OpenGL, if this could still be pushed even further—or should I even bother?

For my tests I load the blender monkey model, store it into a display list using the deprecated calls like:

modelMeshID = MeshGenerator.Generate( delegate {
            GL.Begin( PrimitiveType.Triangles );
            foreach( Face f in model.Faces ) {
                foreach( ModelVertex p in f.Points ) {
                    Vector3 v = model.Vertices[ p.Vertex ];
                    Vector3 n = model.Normals[ p.Normal ];
                    Vector2 tc = model.TexCoords[ p.TexCoord ];
                    GL.Normal3( n.X , n.Y , n.Z );
                    GL.TexCoord2( tc.Y , tc.X );
                    GL.Vertex3( v.X , v.Y , v.Z );
                }
            }
            GL.End();
        } );

and then call that list x amount of times. My question though, is if I could speed this up if I threw VAO's (Vertex Array Objects) into the display list instead of the old GL.Vertex3 api? Would this effect performance at all? Or would it create the same outcome with the display list?

Here is a screen grab of a few thousand:

enter image description here

My system specs:

CPU: AMD Athlon IIx4(quad core) 620 2.60 GHz
Graphics Card: AMD Radeon HD 6800

Solution

My question though, is if I could speed this up if I threw VAO's (Vertex Array Objects) into the display list instead of the old GL.Vertex3 api? Would this effect performance at all? Or would it create the same outcome with the display list?

No.

The main problem you're going to run into is, that Display Lists and Vertex Arrays don't go well with each other. Using buffer objects they kind of work, but display lists themself are legacy like the immediate mode drawing API.

However, even if you manage to get the VBO drawing from within a display list right, there'll be slightly an improvement: When compiling the display list the OpenGL driver knows, that everything that is arriving will be "frozen" eventually. This allows for some very aggressive internal optimization; all the geometry data will be packed up into a buffer object on the GPU, state changes are coalesced. AMD is not quite as good at this game as NVidia, but they're not bad either; display lists are heavily used in CAD applications and before ATI addressed the entertainment market, they were focused on CAD, so their display list implementation is not bad at all. If you pack up all the relevant state changes required for a particular drawing call into the display list, then when calling the display list you'll likely drop into the fast path.

I pushed it even further and finally the framerate started to drop, but this just means that the limit is roughly 4 million triangles per frame with instancing.

What's actually limiting you there is the overhead on calling the display list. I suggest you add a little bit more geometry into the DL and try again.

Display Lists are shockingly efficient. That they got removed from modern OpenGL is mostly because they can be effectively used only with the immediate mode drawing commands. Also recent things like transform feedback and conditional rendering would have been very difficult to integrate into display lists. So they got removed; and rightfully so, because Display Lists are kind of awkward to work with.

Now if you look at Vulkan the essential idea is to set up as much of the drawing commands (state changes, resource bindings and so on) upfront in command buffers and reuse those for varying data. This is like if you could create multiple display lists and have them make babies.