I am using tessnet2 (tesseract-ocr) in C# on following image:
This is my code:
var image = new Bitmap(@"D:\anuj\a2.jpg");
ocr.Init(@"D:\anuj\OCRTest\tessdata", "eng", false);
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (Word word in result)
Console.Write("{0} ", word.Text);
Console.ReadLine();
which gives output: Icurumcretz j
What are ways to get proper resulted text as sample image is pretty clear and of good resolution and still not giving proper text. What are the parameters that need to defined to get correct result. Please reply.
You should try and some image processing on your image to improve your output of tesseract. OpenCV(EmguCV for C# I think) libraries can help you do some of those image processing methods. I used a small medianBlur on the image to reduce the noise and made a binary image out of it.
Testing this image with tesseract gives me the following output: laurumoretz and some gibberish on the next line because I did not remove small blobs(characters from the sticker with the phonenumbers). So it's off by one but I did not use a correction to make the text appear fully lineair.
I hope this will give you a bit of an idea on how to improve the output of tesseract-ocr.