Search code examples
windowsocrtesseract

Installing a new font in Tesseract on windows


I have just installed the most recent version of Tesseract on Windows 10 but I find it does not work with seanchló, or old Irish script. Fortunately someone has done something to address that, here:

https://github.com/kscanne/tesseract-gle-uncial

But it seems I need a .traineddata file, which doesn't appear to be in that fairly old repository. Does anyone know how I might be able to extract or generate this file and use it to read some 1910s-era documents?

Thanks very much!


Solution

  • I found this release by one of the contributors of the repository you linked. It seems to be a fork of the repo and contains the gle_uncial.traineddata file.

    jimregan/tesseract-gle-uncial/releases

    To use it just copy it to where ever your tessdata folder is and pass it as a -l argument to Tesseract OCR.