Search code examples
tesseract

Does the font name in Tesseract box/tif filenames matter?


In the Tesseract wiki the format for labeled tif/box file filenames to be used in training is given as [lang].[fontname].exp[num]. Does fontname actually impact training or is this just for bookkeeping?

In my particular case, I have a large number of document images with different fonts (and I don't know which fonts are in them). Can I just use eng.idontknow.exp[num] for each document I label manually or will this mess up training for some reason? Thanks in advance!


Solution

  • It's best to match a real font (to help possible post-OCR analyses), but it can be some arbitrary font name.