Search code examples
optimizationwebgl

Working around WebGL readPixels being slow


I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.

As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:

Circuit animation

The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).

What are some tricks for speeding up readPixels in this kind of use case?

  • Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
  • Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
  • Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
  • Should I be using a different method to get the information back out of the textures?
  • Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?

Solution

  • Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?

    Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.

    Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.

    Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?

    This will get you immensely better performance.

    If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).

    The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)