I'm making a 2d game involving drawing huge numbers of overlapping quads to the screen. What goes in front of what doesn't really matter.
If I draw each of my quads with z values from 0 upwards and have glDepthFunc(GL_LESS) set I get quite a nice speed boost as you would expect. This is to avoid having to draw quads which are either totally hidden or partially hidden behind other quads. So I draw the quads using something like:
float small = (float(1)/1000000);
for (int iii = 0; iii < 100000; iii++) {
freeSpace = bullets[iii]->draw(opengl, freeSpace, iii*small);
However as I don't use the z value for actual depth it seems like I should be able to just go:
for (int iii = 0; iii < 100000; iii++) {
freeSpace = bullets[iii]->draw(opengl, freeSpace, 0.0f);
Or just code the z value of 0.0f into the shader. (the 3rd argument is the z value and ends up being set to gl_position in the shader unchanged.)
The strange thing is that the second method (where I set the z value to 0.0f everytime), ends up getting almost less than half the framerate of the former.
Why is this? They both use glDepthFunc(GL_LESS) and
glDrawArrays(GL_TRIANGLES, 0, 100000*(2*3));
Just the same. I would think that if any setting the z to 0.0f each time would be faster. Why is it not?
I'm not positive, but my speculation is that the small delta in z values between primitives allows the zcull hardware to work. This will cull out the fragments before they get to the fragment shader. Besides avoiding the fragment shader work, this culling can happen at a faster rate than normal z-testing when the fragment makes it to the depth buffer test.