Search code examples
c#ocrtesseract

IVI ocr Tesseract


Hi all I have a problem with the OCR Tesseract for C# (tessnet2) it find the caractère IVI and not "M" can you help me?

tessnet2.Tesseract ocr = new tessnet2.Tesseract();
         ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ"); // If digit only
         ocr.Init(@"C:\tresnet", "fra", false); // To use correct tessdata
         List<tessnet2.Word> result = ocr.DoOCR(imgSortie, Rectangle.Empty);
         String ListeLettres= "";
        
         foreach (tessnet2.Word word in result)
           ListeLettres= ListeLettres + word.Text;


Solution

  • @user2094482 Hi,

    I was engaged with character recognition with Tesseract and c++. Once i faced the same problem. My system recognized |v| instead of M even the image was clear for my naked eye. I tried several image pre processing concepts such as image binarisation, image blur and etc to get accurate results. But none of those methods gave 100% accurate results for me. Therefore i tried white listing and it was a success.

    text  = readLettersFromTesseractOCR(img_bw,&error,CharacterSequence);
    

    CharacterSequence was initialized as below.

     CharacterSequence = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&lt
    

    Hope this will work with your system as well.