python winapi unicode diacritics directwrite

DirectWrite not adjusting for diacritics

I am currently working on debugging some of the DirectWrite code I have written, as I have run into issues when testing with non-English characters. Mostly with getting multiple Unicode characters returning proper indices.

EDIT: After further research, I believe the issue is diacritics, the extra character should be combined somehow. The DWRITE_SHAPING_GLYPH_PROPERTIES field isDiacritic does return 1 for the last unicode codepoint. However, it doesn't seem like the shaping process takes these into account at all. GetGlyphPlacements returns 0's for advance and offset for the diacritic glyph. The LSB is around -5 but that's not enough to offset to the correct position. Does anyone know where in the shaping process DirectWrite is supposed to take diacritics into account and how?

Consider this character: œ̃

It is displayed as one character (through most text editors), but two codepoints: U+0153 U+0303

How do I account for this in GetGlyphs(), since they are separate codepoints? In my code, it is returning two different indices (177, 1123), and one cluster (0, 0).

This is what ends up getting rendered:

Which is consistent with both codepoints rendered individually, but not the actual character. The actual indice count returned by GetGlyphs() is 2.

My questions are as follows:

Should this be returning one indice from GetGlyphs()?
Should I even be getting one indice, or is there some magic involved with two different indices, where at some stage in the process they are combined in the glyph run?
If I should be getting one indice, what process/functions are these indices combined at? Perhaps a bug in my ScriptAnalysis? Trying to narrow down where the issue may be.
Should I be using the length of the characters and not include codepoints?

I apologize as I am not super knowledgeable about fonts/Unicode and the inner workings of the whole shaping process.

Here is some of my code for the process I use to get the indices and advances:

text_length = len(text.encode('utf-16-le')) // 2
text_buffer = create_unicode_buffer(text, text_length)

self._text_analysis.GenerateResults(self._analyzer, text_buffer, len(text_buffer))

# Formula for text buffer size from Microsoft.
max_glyph_size = int(3 * text_length / 2 + 16)

length = text_length
clusters = (UINT16 * length)()
text_props = (DWRITE_SHAPING_TEXT_PROPERTIES * length)()
indices = (UINT16 * max_glyph_size)()
glyph_props = (DWRITE_SHAPING_GLYPH_PROPERTIES * max_glyph_size)()
actual_count = UINT32()

self._analyzer.GetGlyphs(text_buffer,
                         len(text_buffer),
                         self.font.font_face,
                         False,  # sideways
                         False,  # rtl
                         self._text_analysis.script,  # scriptAnalysis
                         None,  # localName
                         None,  # numberSub
                         None,  # typo features
                         None,  # feature range length
                         0,  # feature range
                         max_glyph_size,  # max glyph size
                         clusters,  # cluster map
                         text_props,  # text props
                         indices,  # glyph indices
                         glyph_props,  # glyph pops
                         byref(actual_count)  # glyph count
                     )

advances = (FLOAT * length)()
offsets = (DWRITE_GLYPH_OFFSET * length)()
self._analyzer.GetGlyphPlacements(text_buffer,
                                  clusters,
                                  text_props,
                                  text_length,
                                  indices,
                                  glyph_props,
                                  actual_count,
                                  self.font.font_face,
                                  self.font.font_metrics.designUnitsPerEm,
                                  False, False,
                                  self._text_analysis.script,
                                  self.font.locale,
                                  None,
                                  None,
                                  0,
                                  advances,
                                  offsets)

EDIT: Here is rendering code:

def render_single_glyph(self, font_face, indice, advance, offset, metrics):
    """Renders a single glyph using D2D DrawGlyphRun"""
    glyph_width, glyph_height, lsb, font_advance = metrics

    # Slicing an array turns it into a python object. Maybe a better way to keep it a ctypes value?
    new_indice = (UINT16 * 1)(indice)
    new_advance = (FLOAT * 1)(advance)

    run = self._get_single_glyph_run(font_face,
                                     self.font._real_size,
                                     new_indice,  # indice,
                                     new_advance,  # advance,
                                     pointer(offset),  # offset,
                                     False,
                                     False)


    offset_x = 0
    if lsb < 0:
        # Negative LSB: we shift the layout rect to the right
        # Otherwise we will cut the left part of the glyph
        offset_x = math.ceil(abs(lsb))

    font_height = (self.font.font_metrics.ascent + self.font.font_metrics.descent) * self.font.font_scale_ratio

    # Create new bitmap.
    self._create_bitmap(int(math.ceil(glyph_width)),
                        int(math.ceil(font_height)))

    # This offsets the characters if needed.
    point = D2D_POINT_2F(offset_x, int(math.ceil(font_height)))

    self._render_target.BeginDraw()

    self._render_target.Clear(transparent)

    self._render_target.DrawGlyphRun(point,
                                     run,
                                     self.brush,
                                     DWRITE_MEASURING_MODE_NATURAL)

    self._render_target.EndDraw(None, None)
    image = wic_decoder.get_image(self._bitmap)

    glyph = self.font.create_glyph(image)
    glyph.set_bearings(self.font.descent, offset_x, round(advance * self.font.font_scale_ratio))  # baseline, lsb, advance
    return glyph

Solution

Shaping process is controlled by your input which is (text,font,locale,script,user features). All that affects results you get. To answer your questions specifically:

Should this be returning one indice from GetGlyphs()?

That's mostly defined by your font.

Should I even be getting one indice, or is there some magic involved with two different indices, where at some stage in the process they are combined in the glyph run?

GetGlyphs() operates on single run. Glyphs are free to form a cluster according to shaping rules defined per-script, and according to transformations defined in the font.

If I should be getting one indice, what process/functions are these indices combined at? Perhaps a bug in my ScriptAnalysis? Trying to narrow down where the issue may be.

Basically, if your input arguments are correct, you get what you get as output, you can't really control the core of it. What you can do is to test output for the same text and font on Uniscribe, on CoreText (macos), and on Chromium/Firefox (harfbuzz) to see if they differ.

Should I be using the length of the characters and not include codepoints?

I didn't get this one.