android performance opengl-es vertex-buffer

Rotate individual polygons in an Open GL vertex array?

I am working on a game for Android using OpenGL ES, and I have run into a performance problem.

What I am trying to do: I have a bunch of objects on screen that all share the same mesh, but all have individual rotations and translations. You could compare it to Asteroids where you have a bunch of asteroids moving around on screen.

The actual problem: I am taking the performance hit because I rotate and translate each object individually, and the overhead of sending the vertex array is too large compared to the number of vertices (tens of them per object).

What can I do? One solution I have thought of is to update the vertices in software myself, before putting them into the vertex buffer. That would probably spare me some overhead, but it seems counterintuitive.

Please share any ideas or suggestions you might have! Thanks!

Solution

"the overhead of sending the vertex array" seems to imply you're not using server-side buffers for the vertices/indices. If this is the case, take a look at section 2.9 of the GLES 1.1 spec, "Buffer Objects".

Of course, even if you're using server-side buffers, sufficiently many small glDrawElements calls could easily be a performance bottleneck.

If all your objects are static, you could just pre-transform them all and pay Nx the memory on the server.

If your objects are dynamic, things are more tricky. "Instanced" drawing (see for example Direct-X's DrawInstanced) could help, but I don't believe GLES has anything like that ("instanced" drawing would also save memory in the static case).

Using GLES 2.0, you could try something like:

put M copies of the mesh into a vertex buffer, giving each vertex an additional attribute which is the index of the copy
in the vertex shader, load the transformation matrix (or a subsection of it, if some of it is fixed) from a uniform array, indexed by the additional attribute

Then you could do N/M glDrawElements calls, setting up the M matrices in the uniform array each time. It's not clear that this would actually be faster, as (for one) the hardware would have to work harder (indexed uniforms aren't super-cheap). Also, I don't think anything like that is possible in GLES 1.1.