I am trying to understand the rendering of "بڑ" (Urdu - Unicode 1576, 1681) with a font Jameel_Noori_Nastaleeq.ttf.
The string is converted into glyphs [607, 460, 471, 1651] by the GSUB table. I can detect the correct anchor-attachment of the second glyph under the first one. But I can not find an appropriate GSUB subtable, that would position the third glyph on top of the first one. Here, the left one is correct, the right one is what my program does at the moment.
Also, I don't quite understand the LookupType 8 of GSUB. Some LookupTable can have LookupFlags including a bit 8 - ignoreMarks. When matching the Backtrack, Input and Lookahead sequences, should I take these flags into account, i.e. skip marks? What exactly is the mechanism of matching and applying LookupType 8?
The positioning of both marks (the small tah, and dot of 'beh') is done through a lookup in the Mark Positioning ('mark') feature of the GPOS table, which is applied after the GSUB rules are applied. There is no GSUB-only way to get the correct final positioning. The GPOS must be processed (after the GSUB).
As to the ignoreMarks flag: the flag isn't specific to GSUB LookupType8. Any lookup (GSUB or GPOS) can set this flag. It tells the layout engine to ignore marks in the sequence under consideration, for purposes of matching context. This allows defining substitution contexts with only the "root" glyphs of a sequence, so if the context rule is A B C
, a lookup with the ignoreMarks flag set would match A (mark) B C
, A B (mark) C
, A B C
, etc.
It comes into play in this font because first the two input characters are decomposed (in the GSUB) to a sequence of base + mark glyphs, then recomposed (also in the GSUB), then the marks are positioned (in the GPOS).
(as an aside: why are you doing text layout yourself, as opposed to using an existing layout engine, such as HarfBuzz or engines built in to other operating systems?)