Search code examples
c#charocrtesseractwhitelist

Special Character Whitelist with Tesseract (OCR)


Iam trying to read out some Money Values via OCR, the Issue is that I want to tell him which chars he should recognize.

This is my current whitelist

       Version : Tesseract from Charles Weld v3.0.2
       tessedit_char_whitelist "0123456789,.$"

How do I include the Cent (¢) ?

Update 1: If I add the ¢ to the list it wont recognize it.


Solution

  • Okay after failing to understand the question the first time I have a more relevant answer.

    ocr.SetVariable("tessedit_char_whitelist", "0123456789,.$¢");
    

    Supply the name of the parameter and the value as a string, just as you would in a config file. Eg

    SetVariable("tessedit_char_whitelist", "xyz"); to whitelist x, y and z. 
    

    Also make sure

    SetVariable("classify_bln_numeric_mode", "1 or 0"); 
    

    to set numeric-only mode or disable numeric only mode. Which ever one meets your needs I would guess in your case it should be disabled because you are using characters as well as numbers.

    Hope this helps! If not let me know and I'll remove the answer(I had to use an answer because I can't comment under 50 rep, otherwise I would have commented first to get more information about the problem) Cheers!