I've got document OCR working on an image, works fine when there are words like "coffee" or "432" on the page, but when I try to OCR a word like "abc123", I get an "OCR Running Error".
MODI.Document md = new MODI.Document();
md.Create("c:\\temp\\mpk.tiff");
md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); // <-- Error thrown here
MODI.Image image = (MODI.Image)md.Images[0];
FileStream createFile = new FileStream("c:\\temp\\mpk.txt", FileMode.CreateNew);
StreamWriter writeFile = new StreamWriter(createFile);
writeFile.Write(image.Layout.Text);
writeFile.Close();
md.Close();
Surely MS didn't build this library to only recognize language based words? Or did they? Am I missing a MODI.document setting or something?
Any Help would be appreciated,
Yes they did. OCR gets really inaccurate without a relevant dictionary and fragments that don't provide context. So do humans: ABC123, ABCI23, ABCl23. Three different strings. This is solved in practice by using special fonts that minimize the odds that letters and numbers are ambiguous, the kind you see on a bank check.