I am learning about OCR and trying to read some text from an image with a changing background.
I am using a bitmap to take a screenshot and then feed it to IronOCR to recognize the characters in the image.
// Selecting the area where I capture the image
Rectangle rectangle = new Rectangle();
rectangle.X = 830;
rectangle.Y = 980;
rectangle.Width = 270;
rectangle.Height = 100;
Rectangle bounds = rectangle;
using (Bitmap bitmap = new Bitmap(bounds.Width, bounds.Height))
{
bitmap.SetResolution(500, 500);
using (Graphics g = Graphics.FromImage(bitmap))
{
g.CopyFromScreen(new Point(bounds.Left, bounds.Top), Point.Empty, bounds.Size);
}
// Save the image
bitmap.Save(@"testimages\1.tiff", ImageFormat.Tiff);
}
// Reading the characters
var Ocr = new IronTesseract();
using (var Input = new OcrInput(@"testimages\1.tiff"))
{
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
This is how the image looks like: IMAGE
The background of the image changes a little bit, but the text stays the same. The text can be modified to more readable characters (for example instead of "--SOME TEXT HERE --" I could change it to "X X X X X X X X X X"). Any ideas on how I can improve on my OCR?
My question is how can I improve this in code to make the OCR more reliable and is there anything in the capturing image process that could improve my results?
Ultimately, my goal would be to uniquely determine this at least with 95% accuracy that this is the text that appeared.
If I run this 5 times these are the outputs:
ATTEMPT 1:
) 3-‘§0ME’TEXT;}TERE --;
P LW hl
ATTEMPT 2:
: SRR TS o ' A \
ATTEMPT 3:
L;.,Q{SOMEYEXT (]3]
Ty
ATTEMPT 4:
‘GEE UG
ATTEMPT 5:
N TR
If anyone is having issues with this, what helped me was Input.Invert() that Inverts every color. E.g. White becomes black. Black becomes white. This improved my results significantly.
using (var Input = new OcrInput(@"testimages\image1.tiff"))
{
Input.EnhanceResolution();
Input.Contrast();
Input.Invert();
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}