Is it faster to use texelFetch when rendering fonts?

I am writing some font drawing shaders in OpenGL 3.3. I will render my font into a texture atlas and then generate some display lists for some text I want to draw. I would like the rendering of text to consume the least amount of resources (CPU, GPU memory, GPU time). How can I accomplish this?

Looking at Freetype-gl, I noticed that the author generates 6 indices and 4 vertices per character.

Since I am using OpenGL 3.3, I have some additional freedom. My plan was to generate 1 vertex per character plus one integer "code" per character. The character code can be used in texelFetch operations to retrieve texture coördinates and character size information. A geometry shader turns the size information and vertex into a triangle strip.

Is texelFetch going to be slower than sending more vertices/texture coördinates? Is this worth doing?, or is there are reason why it's not done in the font libraries I looked at?

Final code:

Vertex shader:

#version 330

uniform sampler2D font_atlas;
uniform sampler1D code_to_texture;
uniform mat4 projection;
uniform vec2 vertex_offset;  // in view space.
uniform vec4 color;
uniform float gamma;

in vec2 vertex;  // vertex in view space of each character adjusted for kerning, etc.
in int code;

out vec4 v_uv;

void main()
{
    v_uv = texelFetch(
            code_to_texture,
            code,
            0);
    gl_Position = projection * vec4(vertex_offset + vertex, 0.0, 1.0);
}

Geometry shader:

#version 330

layout (points) in;
layout (triangle_strip, max_vertices = 4) out;

uniform sampler2D font_atlas;
uniform mat4 projection;

in vec4 v_uv[];

out vec2 g_uv;

void main()
{
    vec4 pos = gl_in[0].gl_Position;
    vec4 uv = v_uv[0];
    vec2 size = vec2(textureSize(font_atlas, 0)) * (uv.zw - uv.xy);
    vec2 pos_opposite = pos.xy + (mat2(projection) * size);

    gl_Position = vec4(pos.xy, 0, 1);
    g_uv = uv.xy;
    EmitVertex();

    gl_Position = vec4(pos.x, pos_opposite.y, 0, 1);
    g_uv = uv.xw;
    EmitVertex();

    gl_Position = vec4(pos_opposite.x, pos.y, 0, 1);
    g_uv = uv.zy;
    EmitVertex();

    gl_Position = vec4(pos_opposite.xy, 0, 1);
    g_uv = uv.zw;
    EmitVertex();

    EndPrimitive();
}

Fragment shader:

#version 330

uniform sampler2D font_atlas;
uniform vec4 color;
uniform float gamma;

in vec2 g_uv;

layout (location = 0) out vec4 fragment_color;

void main()
{
    float a = texture(font_atlas, g_uv).r;
    fragment_color.rgb = color.rgb;
    fragment_color.a = color.a * pow(a, 1.0 / gamma);
}

Solution

I wouldn't expect there to be a significant performance difference between your proposed method vs storing the quad vertex positions and texture coordinates in a vertex buffer. On the one hand your method requires a smaller vertex buffer and less work for the CPU. On the other hand the texelFetch calls will be more-or-less at random locations, and not make the best use of the cache. This last point may not be very significant as I guess that texture wont be very large. Also, the execution model of geometry shaders mean they can quickly become the bottleneck of the pipeline.

To answer "is this worth doing?" - I suspect not for performance reasons. Unfortunately you can't tell until you implement it and measure the performance. I think it's quite a cool idea though, so I don't think you'd be wasting your time trying it out.