I am new in tesseract and I am making a class project in which I need to scan number matrices. I have been successful in reading numbers from an image file but I haven't found yet how to recognize spacing between digits. For example currently I am getting 14610 for 1 4 6 10.
Image:
Code I am currently using:
Bitmap myBmp = new Bitmap(file);
var image = myBmp;
var ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(@"C:\Users\MuhammadShahroz\Documents\Visual Studio 2013\Projects\ConsoleApplication3\tessdata", "eng", false);
var results = ocr.DoOCR( image, Rectangle.Empty);
foreach (Word word in results)
{
Console.WriteLine("{0} : {1}", word.Confidence, word.Text);
mystring = String.Format("{0 } ",word.Text);
}
I think you will need to set variable preserve_interword_spaces=1
(see configuration source)