Search code examples
pythonunicodefreetypefreetype2freetype-py

What is the meaning of "uniE0A1" as returned by FT_Get_Glyph_Name?


This is question is probably the product of my not understanding something fundamental, but I could really do with some help, so here goes.

While trying to wrap my head around text rendering, freetype etc. I came across those strange glyphs that as I undestand it report themselves to be associated with a unicode code point, but when I check from the unicode side, that code point is not valid.

For example, using the font "Hack" the glyph at index 1437 is an example of these mystery glyphs, see below for what it looks like.

Here is some demonstration code using the freetype-py python wrapper of freetype.

First, as an example for what looks plausible and applies to >99% of glyphs let's look at letter "A":

import numpy as np
import freetype as FT
import unicodedata

ff = FT.Face('/usr/share/fonts/truetype/Hack-Regular.ttf')
ff.set_char_size(12<<6)

ff.load_glyph(1425)
ff.get_glyph_name(1425)
# b'uni0041'

hex 41 is decimal 65 which is ascii/unicode for 'A', and the rendered bitmap also looks 'A'.

np.array(ff.glyph.bitmap.buffer).reshape(-1,8)
# array([[  0,   0,  67, 255, 121,   0,   0,   0],
#        [  0,   0, 143, 213, 198,   0,   0,   0],
#        [  0,   0, 218,  85, 250,  21,   0,   0],
#        [  0,  38, 248,   9, 203,  95,   0,   0],
#        [  0, 115, 191,   0, 136, 171,   0,   0],
#        [  0, 191, 125,   0,  69, 242,   5,   0],
#        [ 15, 250, 252, 252, 252, 255,  68,   0],
#        [ 87, 231,   2,   0,   0, 178, 145,   0],
#        [162, 152,   0,   0,   0,  97, 221,   0]])
unicodedata.name(chr(0x0041))
# 'LATIN CAPITAL LETTER A'

now let's do the same for glyph index 1437:

ff.load_glyph(1437)
ff.get_glyph_name(1437)
# b'uniE0A1'
np.array(ff.glyph.bitmap.buffer).reshape(-1,5)
# array([[ 56,  70,   0,   0,   0],
#        [112, 140,   0,   0,   0],
#        [112, 140,   0,   0,   0],
#        [112, 140,   0,   0,   0],
#        [112, 140,   0,   0,   0],
#        [112, 140,   0,   0,   0],
#        [105, 232, 224, 178,   0],
#        [  0, 168, 150,  40, 216],
#        [  0, 168, 241,  46, 216],
#        [  0, 168, 223, 124, 216],
#        [  0, 168, 131, 215, 216],
#        [  0, 168,  81, 212, 216],
#        [  0, 168,  84, 108, 216]])
unicodedata.name(chr(0xE0A1))
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: no such name

So, the glyph calls itself "uniE0A1" but as I said unicode has no code point there (I double-checked and it is not in UnicodeData.txt (version 12 I think)) and I do not recognize the bitmap.

This question is loosely related to Why does num_glyphs not match the number of glyphs enumerated by FT_Get_First_Char / FT_Get_Next_Char , another example of things not adding up.


Solution

  • The code point U+E0A1 lies in a Private Use area. A font can use it for custom characters.

    https://www.unicode.org/charts/PDF/UE000.pdf