Search code examples
ubuntu-16.04ocrtesseract

List custom fonts in tesseract-ocr/langdata/font_properties?


I am using Tesseract 4.0.0-beta.1-370-g8b64 on Ubuntu 16.04 by building it from source. I've got a directory of font files, and it seems from the documentation for fonts that you need to list the custom fonts in training/language_specific.sh and langdata/font_properties. Also it seems that fonts are listed in font_properties in some particular format, however I can't find the format anywhere. Is there any link or instruction asking how to do it?


Solution

  • It's described in Tesseract Training Wiki:

    https://github.com/tesseract-ocr/tessdoc/blob/master/tess3/Training-Tesseract-3.03%E2%80%933.05.md#the-font_properties-file

    Each line of the font_properties file is formatted as follows: fontname italic bold fixed serif fraktur where fontname is a string naming the font (no spaces allowed!), and italic, bold, fixed, serif and fraktur are all simple 0 or 1 flags indicating whether the font has the named property.

    Example:

    timesitalic 1 0 0 1 0