Search code examples
javascripttesseract

What is font_properties in making traineddata for Tesseract OCR?


I'm trying to create a traineddata file to train tesseract how to read the images I will feed it but I don't understand what to include in the font_properties step. I'm following this example and the answer to this post. Both examples only put 0 and 1 as values for font_properties and my traineddata file is for specific alphanumeric values. Would you tell me more about what to include in step 3 in the second link I sent you. Can it be anything, is it just like a plain description for the font or is it actually important and needs to be accurate.


Solution

  • Each line of the font_properties file is formatted as follows: fontname italic bold fixed serif fraktur where fontname is a string naming the font (no spaces allowed!), and italic, bold, fixed, serif and fraktur are all simple 0 or 1 flags indicating whether the font has the named property.

    Example:

    timesitalic 1 0 0 1 0

    https://tesseract-ocr.github.io/tessdoc/tess3/Training-Tesseract-3.03%E2%80%933.05.html#set_unicharset_properties