Search code examples
linuxfontsopentypetruetype

Finding out what characters a given font supports


How do I extract the list of supported Unicode characters from a TrueType or embedded OpenType font on Linux?

Is there a tool or a library I can use to process a .ttf or a .eot file and build a list of code points (like U+0123, U+1234, etc.) provided by the font?


Solution

  • Here is a method using the fontTools Python library (which you can install with something like pip install fonttools):

    #!/usr/bin/env python
    from itertools import chain
    import sys
    
    from fontTools.ttLib import TTFont
    from fontTools.unicode import Unicode
    
    with TTFont(
        sys.argv[1], 0, allowVID=0, ignoreDecompileErrors=True, fontNumber=-1
    ) as ttf:
        chars = chain.from_iterable(
            [y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables
        )
        if len(sys.argv) == 2:  # print all code points
            for c in chars:
                print(c)
        elif len(sys.argv) >= 3:  # search code points / characters
            code_points = {c[0] for c in chars}
            for i in sys.argv[2:]:
                code_point = int(i)   # search code point
                #code_point = ord(i)  # search character
                print(Unicode[code_point])
                print(code_point in code_points)
    

    The script takes as arguments the font path and optionally code points / characters to search for:

    $ python checkfont.py /usr/share/fonts/**/DejaVuSans.ttf
    (32, 'space', 'SPACE')
    (33, 'exclam', 'EXCLAMATION MARK')
    (34, 'quotedbl', 'QUOTATION MARK')
    …
    
    $ python checkfont.py /usr/share/fonts/**/DejaVuSans.ttf 65 12622  # a ㅎ
    LATIN CAPITAL LETTER A
    True
    HANGUL LETTER HIEUH
    False