timing webgl and webgl call sequentiality

Currently implementing a gpgpu sorter using webgl by rendering to textures. While I have a working sorter I am having difficulty comparing the times of its execution, especially to compare it with the default js sort.

I have 3 main functions for the gpu sorting:

initGpu(..) - sets up the textures, buffers, framebuffers, etc.
sortGpu(..) - sets uniforms and runs the shader programs to sort the input texture to draw to a framebuffer+texture
readFB(..) - dumps the contents of a given framebuffer using readPixels

To time cpu sorting I simply wrap the call around a time difference, i.e.

const a = [1, ..., 100];
const then = performance.now();
a.sort();
console.log(`${performance.now() - then}ms`)

Wrapping sortGpu(..) in a similar fashion seems to result in the same number of milliseconds (~0.005) irrespective of increasing the size of the input array until the time taken for the draw call exceeds the maximum allowed and the gl instance is lost. I would understand the values being the same up to a point, but my GPU has ~1000 cuda cores so it should definitely slow down for lengths beyond that value.

It is my understanding that calls to gl are entirely sequential in js, and that the below would imply that x is modified explicitly after the drawing is complete, and that this is part of what makes batched drawing more efficient.

gl.drawArrays(...);
x += 10;

readFB(..) (and therefore I assume readPixels(..)) necessitates this sequentiality as otherwise the array it outputs would not be reliable. Knowing this I realise that it should be possible to use the previous method to accurately record the time for sortGpu(..); readFB(..) but this would add an overhead I'm not interested in; I intend to keep the output as a texture to use elsewhere in GPGPU.

Solution

It is my understanding that calls to gl are entirely sequential in js

Well... yes and no. In general, the GPU will perform its computations asynchronously with respect to the CPU. The CPU will only stop and wait for the GPU when it is necessary to preserve the illusion of sequential consistency. As you observed, reading back data from the GPU is one action that will necessitate that the CPU waits. However, there is a function that might help: glFinish (which is accessible in WebGL as gl.finish()).

According to the OpenGL ES 2.0 reference (which is what WebGL 1.0 is based on):

5.1

The command

void Finish( void );

forces all previous GL commands to complete. Finish does not return until all effects from previously issued commands on GL client and server state and the framebuffer are fully realized.

Assuming that your browser respects this request (and doesn't just ignore it), you should be able to insert a gl.finish() call just after the a.sort() call to get more accurate timings.