Search code examples
ocrtesseract

Failed in generating Tesseract traineddata


I'm using Tesseract v5.0.1.20220118 on Windows 10, training a font only have letter "P" and "Q".

When I get to the step

mftraining -F font_properties.txt -U unicharset -O normal.unicharset pq.normal.exp0.tr

The pffmtable file is not generated.

And when I run code cntraining pq.normal.exp0.tr

It shows me

Reading pq.normal.exp0.tr ...
Clustering ...
N == sizeof(Cluster->Mean):Error:Assert failed:in file ../../../src/classify/cluster.cpp, line 2526

Why it goes wrong? How can I fix it?

I only have inttemp and shapetable generated, but the tutorial says there will be four files include shapetable, inttemp, pffmtable and normproto, I wonder that maybe is beacuse of the font only have letter "P" and "Q", but I have no idea how to solve it.


Solution

  • Please read the docs:

    https://tesseract-ocr.github.io/tessdoc/#training-for-tesseract-5

    Use the right tools:

    https://github.com/tesseract-ocr/tesstrain