Search code examples
c#tesseract

c#-tesseract get space recoginition in digits


I am new in tesseract and I am making a class project in which I need to scan number matrices. I have been successful in reading numbers from an image file but I haven't found yet how to recognize spacing between digits. For example currently I am getting 14610 for 1 4 6 10.

Image:

enter image description here

Code I am currently using:

Bitmap myBmp = new Bitmap(file);
var image = myBmp;
var ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only 

ocr.Init(@"C:\Users\MuhammadShahroz\Documents\Visual Studio 2013\Projects\ConsoleApplication3\tessdata", "eng", false);
var results = ocr.DoOCR( image, Rectangle.Empty);

foreach (Word word in results)
{
    Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
    mystring = String.Format("{0 } ",word.Text);
}

Solution

  • I think you will need to set variable preserve_interword_spaces=1 (see configuration source)