I have just installed the most recent version of Tesseract on Windows 10 but I find it does not work with seanchló, or old Irish script. Fortunately someone has done something to address that, here:
https://github.com/kscanne/tesseract-gle-uncial
But it seems I need a .traineddata file, which doesn't appear to be in that fairly old repository. Does anyone know how I might be able to extract or generate this file and use it to read some 1910s-era documents?
Thanks very much!
I found this release by one of the contributors of the repository you linked. It seems to be a fork of the repo and contains the gle_uncial.traineddata
file.
jimregan/tesseract-gle-uncial/releases
To use it just copy it to where ever your tessdata
folder is and pass it as a -l
argument to Tesseract OCR.