When I set glStencilFunc( GL_NEVER, . . . )
effectively disabling all drawing, and then run my [shader-bound] program I get no performance increase over letting the fragment shader run. I thought the stencil test happened before the fragment program. Is that not the case, or at least not guaranteed? Replacing the fragment shader with one that simply writes a constant to gl_FragColor does result in a higher FPS.
Take a look at the following outline for the DX10 pipeline, it says that the stencil test runs before the pixel shader:
and the same is true in DX11:
http://4.bp.blogspot.com/_2YU3pmPHKN4/S1KhDSPmotI/AAAAAAAAAcw/d38b4oA_DxM/s1600-h/DX11.JPG
I don't know if this is mandated in the OpenGL spec but it would be detrimental for an implementation to not do the stencil test before running the fragment program.