Search code examples
pythonazureocrazure-cognitive-services

How to Increase Accuracy of Text Read in Images Using Microsoft Azure Computer Vision AI


I'm new to Microsoft Azure AI Computer Vision. I am using Cognitive Services and the Computer Vision Client in a Python Program to do two things:

  1. Extract text from a JPG Image using Optical Character Recognition (OCR)
  2. Use Cognitive Services to provide a Description of the Image

After lots of configuration issues (and PIP installs!), I have achieved SOME results

The Code for extracting the text from the image is:

#Create A ComputerVision Client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

image_path = '/Users/Owner/Documents/Bills Stuff/eBay/Images/Document_20240914_0008.jpg'  

#Use Azure AI Cognitive Services to Get the Title and Description of Image
#For the TITLE, Use Optical Character Recognition (OCR) to Read the Text (Caption) on the Image
with open(image_path, "rb") as image_stream:
      ocr_results=client.recognize_printed_text_in_stream(image_stream)

if ocr_results.regions:
    for region in ocr_results.regions:
            for line in region.lines:
                    print(f"   Title: {' '.join([word.text for word in line.words])}")

My second Point - the Description is working great, BUT the code above is not extracting the Text from the Image accurately at all.

It's CLOSE, but the actual text is: "Scenic View of Horseshoe Curve on Pennsylvania Railroad"

The Code I presented above, returns: "Inside of Horseshoe Curve on Cina Railroad"

Is there a way to improve my code to make this result more accurate?

Adding: If I decrease/increase the size of my image, the code picks upmore or fewer words - maybe I need to somehow give the code more time to process the image??

Maybe, if someone could answer my question if I phrase it more broadly:

When I use the Microsoft Azure Computer Vision Sample AI online Tool Found Here:https://portal.vision.cognitive.azure.com/demo/extract-text-from-images

The text from the image is processed 100% correctly.

The output displays blue boxes around each block of text. I think these are called Bounding Boxes.

It appears that if Bounding Boxes are used then maybe the accuracy improves??

Again, the Azure online Tool at the URL above is 100% correct.

My code does not use Bounding Boxes and is about 75% accurate.

Can someone point me in the right direction?


Solution

  • I suggest you look into combining Azure AI Vision v4.0 in conjunction with Azure OpenAI GPT4-Turbo with Vision.

    The concept is that you first process your image using the GPT4-Turbo with Vision model which will help you analyze the image and provide details regarding the locations of readable text in the image you provided. You should be looking into using the Vision Enhancement option.

    Using this information, your OCR results should improve. Albeit, you'll have to keep in mind that the cost of processing each individual image will obviously increase as you'll essentially be processing the image twice.

    You can review Microsoft's documentation regarding this here.