Search code examples
ocrazure-cognitive-servicesazure-form-recognizer

Sample Labeling Tool OCR Text Detection Problems


I have a question regarding Azure Form Recognizer's OCR with handwritten text.

When running OCR on handwritten PDF files before labeling in Azure's Sample Labeling Tool, the OCR often detects text incorrectly. With other form analysis and extraction technologies, an option is often provided to enter the text that was supposed to be detected to essentially "correct" the OCR. For training Azure Form Recognizer in the Sample Labeling Tool (Docker image), I do not see a way for me to override the OCR text and enter the correct text.

Is there a way I can enter the text myself that the OCR is failing to detect or detecting incorrectly?

For example, the image below is what the OCR in Azure's Sample Labeling Tool picked up: OCR detection sample image.

Is there a way to correct this result and tell Form Recognizer that the text should be: "Bridget Sims, MD"?


Solution

  • currently there there is no way to correct the OCR result and improve its accuracy right away. The typical scenario is to train a form recognizer model from a small set of training files, and use it to process more documents. During training, small amount of OCR errors are not essential to the model quality, you could ignore them. The product team is working on a new version of OCR with better handwriting recognition accuracy.

    thanks -xin [Microsoft Azure Form Recognizer Team]