Search code examples
glslwebgl

Does WebGL/GLSL intermediate variables improve performances with no downsides?


When use at least 2+ times, does an intermediate variable systematically improve performances with NO downsides?

Let's have a pragmatic example:

vec4 right = doubleUV + vec4(size.x, 0.0, 2.0 * size.x, 0.0);
vec4 left  = doubleUV - vec4(size.x, 0.0, 2.0 * size.x, 0.0);
vec4 up    = doubleUV + vec4(0.0, size.y, 0.0, 2.0 * size.y);
vec4 down  = doubleUV - vec4(0.0, size.y, 0.0, 2.0 * size.y);  

We can see that 2.0 * size.x and 2.0 * size.y and used multiple times. Using intermediate variable make me intuitively refactor this code as:

float sizex2 = 2.0 * size.x;
vec4 right = doubleUV + vec4(size.x, 0.0, sizex2, 0.0);
vec4 left  = doubleUV - vec4(size.x, 0.0, sizex2, 0.0);

float sizey2 = 2.0 * size.y;
vec4 up    = doubleUV + vec4(0.0, size.y, 0.0, sizey2);
vec4 down  = doubleUV - vec4(0.0, size.y, 0.0, sizey2);  

Besides readability, is this a good "no brainer" performance practice that could be systematically applied? Or should I think about performance cost of an extra multiplication vs variable allocation?

As a side question: May extra temporary variables hurt performances? This is hard to test as the GLSL code is intended for WebGL and will be compiled by a large variety of compilers. Are some GLSL compilers smart enough to group redundant small pieces of code?


Solution

  • For a simple repeated multiplication a temporary variable is not going to improve performance. But when doing more complex operations (like reciprocals, squares, square roots, dot products) introducing extra variables can have a noticeable impact (for the compilers that will not optimize it).

    I would not worry about the performance of inserting temporary variables as those will be stored into the GPU vectorized registers.
    But if you add more variables than there are registers you risk either:

    • failing the compilation of the shader on some GPUs.
    • register spilling, where the variables need to be stored and accessed directly from the GPU execution unit local memory (ARM mobile GPU docs).

    As an alternative (clearer to me at least) way:

    vec4 v1 = vec4(size.x, 0.0, 2.0 * size.x, 0.0);
    vec4 right = doubleUV + v1;
    vec4 left  = doubleUV - v1;
    
    vec4 v2 = vec4(0.0, size.y, 0.0, 2.0 * size.y);
    vec4 up    = doubleUV + v2;
    vec4 down  = doubleUV - v2; 
    

    Here you can see the GGX shading model and the complexity of the operations that are grouped into variables.