I'm trying to set white and blacklists for my ocr-tool. This is importend for me to optimize the results.
Using white or blacklists has no effect. I read that this issue is fixed in Tesseract 4.1 but it`s not working. The variable is going to be set but without any effect of the result.
Let us say the real image text is "AB123CD". By setting the whitelist to "123" my expactation is, that tesseract only recognizes these characters. My expected result text is "123". In fact it`s still "AB123CD".
private TesseractEngine tesseract = new TesseractEngine(path, "eng", EngineMode.LstmOnly);
tesseract.SetVariable("tessedit_char_blacklist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract.SetVariable("tessedit_char_whitelist", "123");
using (var page = tesseract.Process(image, rec, PageSegMode.Auto))
{
text = page.GetText(); // text contains still letters.
using (var iterator = page.GetIterator())
{
iterator.Begin();
do
{
string symbol = iterator.GetText(PageIteratorLevel.Symbol);
} while (iterator.Next(PageIteratorLevel.Symbol));
}
}
Thanks for your help!
Maybe it just doesn't make sense to set both a White list and a Black list.
Try to only set only of them, not both. For me it works fine. I'm using current last version of Tesseract for C# (5.2.14). NuGet package TesseractOCR