Search code examples
azureocr

Preforming OCR on Seven Segment Text with Microsoft's Computer Vision?


I've been using Microsoft's Computer Vision OCR to extract text from various types of images - but have seem to hit a bump in the road with Seven Segment font.

OCR doesn't recognize Seven Segmented Text

It sometimes can pick up on them, but it mostly gets it wrong.

Incorrect result from OCR

I've looked around and found some alternative methods, but would rather continue using the service we already have. Any suggestions?


Solution

  • After a month of research and experimentation, I'm going to share my findings and solutions here in case anyone else encounters the same or a similar problem.

    The Problem

    I needed a reliable way to extract the temperature from multiple types of Refrigeration Displays. Some of these displays used a standard font that Microsoft's Computer Vision had no trouble with, while others used a Seven-Segmented font.

    Due to the nature of Optical Character Recognition (OCR), Seven-Segmented font is not supported directly. To overcome this, you need to apply some image processing techniques to join the segmented text before passing it into the OCR.

    Solution Overview

    1. Create a Custom Vision Object Detection Model to extract the display from the image.
    2. Develop a Custom Vision Classification Model to determine the type of display.
    3. Depending on the classification, pass the image either to Tesseract along with a model specialized for digital text, or to Computer Vision when dealing with standard text.
    4. Apply regular expressions (Regex) to the output from Tesseract to extract the desired temperature.

    Solution Breakdown

    First, we pass the image into our Object Detection Model.

    Input: Original Image

    Object Detection Output: Object Detection Output

    Then we pass that image into the Classification Model to determine the display type.

    Classification Output: Classification Result

    Next, we perform a series of image processing techniques, including:

    • Gaussian Blur and convert to grayscale: Blur & Graysacle
    • RGB Threshold to pull out the text: RGB Threshold
    • Erosion to connect the segmented text: Erosion
    • Dilation to reduce the amount of extruding pixels: Dilation
    • Document Skew (via AForge.Imaging) & to rotate the image to the orientation of the text: Document Skew

    Since this display is classified as 'Segmented,' it then gets passed into Tesseract and analyzed using the 'LetsGoDigital' model, which is specialized for digital fonts.

    Tesseract Output: "rawText": "- 16.-9,,,6\n\f"

    After some Regex, we're left with: "value": "-16.96"

    Admittedly, this process isn't providing the best results, but it's sufficient to move forward. By refining the template, input images, Custom Vision Models, and the OCR process, we can expect to see better results in the future.

    It would be amazing to see Seven Segment Font natively supported by Microsoft's Computer Vision, as the current solution feels somewhat hacky. I'd prefer to continue using Computer Vision instead of Tesseract or any other OCR method, considering the nature of our application.