Search code examples
fontstruetypetext-renderingopentypeharfbuzz

Is The GSUB Table Widely Populated In TrueType Fonts?


Long story short, I'm rendering font without the use of FreeType or HarfBuzz (for various reasons), by manually parsing TrueType and derivative formats to extract metadata and glyph information, to later build bitmaps and distance fields from their outlines at runtime. Something I'm concerned about is reliable glyph substitution where essential, i.e. where certain sequences must be replaced as per the language rules, by another.

What I'm unclear about is how reliable the GSUB table can generally be assumed to be. In other words, is it reasonable to expect that an Arabic font, for example, should provide a populated GSUB table containing the substitutions required for an Arabic script? Or, given that this is per-script, is it generally assumed that fonts would only provide special, per-font substitutions, while the shaping engine is assumed to handle any per-script substituions as global rules? I'm not concerned that the substituted glyph(s) may be unavailable, as the system searches for fallbacks in that case, else reverts to the original sequence.

Obviously having a global ruleset in place per-script would be totally reliable as a fallback, but I want to keep this as minimal as possible. Apologies that this isn't exactly an empirical question, but I'm having trouble finding much information on this, short of having to actually examine a large sample of various fonts. This overview seems to suggest that per-script substitutions will be defined, but given that the tables are modular, there is of course no guarantee that there will even be a table, let alone the required definitions. Failing this, is there any known database of substitutions for various scripts?


Solution

  • A modern OpenType font file is effectively a fully self-contained typesetting program, and a text shaper only gets to "do as instructed by the font" (even if that requires a whole bunch of complexity on the shaper's part), and so there are prebaked list of GSUB rules that are bundled with shapers that are consulted outside of what the font specifies.

    Think of the font as a game rom: while you need a good emulator (text shaper) to properly run the game (font), and it's the emulator's job to make sure all the complex bits like blitting, memory swapping, etc. gets performed at the right time, the game specifies what will happen. Similary, a good text shaper will have all the (complex) logic for how to interpret the OpenType data, and how to process it, in which order, over how many passes, etc. but that data comes only from the font, and nowhere else.

    Of course, that doesn't mean that those kind of lists don't exist: they just don't exist in shapers. They absolutely exist in font building tools, because the job of designing typefaces would be incredibly tedious without them, but each tool has their own lists and presets, and when they generate a font all those rules are encoded into the font file itself: the font becomes the source of truth when it comes to typesetting.

    If you have a font file, you have all the information needed to shape text, provided your shaper code parses the font in compliance with the OpenType specification, and part of that compliance is that the shaper is only allowed to apply what's in the font.

    (Of course, there is some configurability in that OpenType features are explicitly designed in a way that a shaper is allowed to skip applying any or all of them, but it is not allowed to add any of its own)