Which of the following options is better for per-instance data?
I'm not using multiple VAOs to draw one set of instances. I mean for multiple sets of instances spanning multiple VAOs.
There's really not one single approach that will be best in all cases. If there was only one way that always performs best, APIs like OpenGL would not offer all these flexible options.
Some factors that will influence what will be best:
Looking at some typical use cases:
In all of these scenarios, if you have multiple instances of the same object type, i.e. objects that have the same geometry, you will of course want to share the vertex data between them.
Now, one of the questions you may ask about these generic guidelines is: What exactly is "many" objects? Where is the limit between "few" and "many"?
The answer to this depends heavily on the performance characteristics of the hardware/platform. To give at least a rough order of magnitude, I would expect the most common platforms to be able to handle between a few 100,000 and a few million VBO switches and draw calls per second. If you divide that by a target of 60 fps, and want to avoid spending a majority of your total performance budget in this area, I would start worrying about the number of VBO bind and draw calls around a 1000 per frame on lower performance platforms, while high performance platforms might not break a sweat if you go at least an order of magnitude higher.