Search code examples
.netocrmodi

Can you use MODI OCR to recognize non-language specific items?


I've got document OCR working on an image, works fine when there are words like "coffee" or "432" on the page, but when I try to OCR a word like "abc123", I get an "OCR Running Error".

MODI.Document md = new MODI.Document();

md.Create("c:\\temp\\mpk.tiff");

md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);  // <-- Error thrown here
MODI.Image image = (MODI.Image)md.Images[0]; 

FileStream createFile = new FileStream("c:\\temp\\mpk.txt", FileMode.CreateNew);

StreamWriter writeFile = new StreamWriter(createFile);
writeFile.Write(image.Layout.Text);
writeFile.Close();

md.Close();

Surely MS didn't build this library to only recognize language based words? Or did they? Am I missing a MODI.document setting or something?

Any Help would be appreciated,


Solution

  • Yes they did. OCR gets really inaccurate without a relevant dictionary and fragments that don't provide context. So do humans: ABC123, ABCI23, ABCl23. Three different strings. This is solved in practice by using special fonts that minimize the odds that letters and numbers are ambiguous, the kind you see on a bank check.