Search code examples
flaskpostmanpostman-collection-runnerpostman-testcasemicrosoft-azure-documentdb

Why is raw data not accepting the format in POSTMAN?


This is my code block where I created api using Flask and testing the same on POSTMAN. utils.py

utils.py

import os
import base64
from urllib.parse import urlparse
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient

def get_client():
    endpoint = "endpoint"
    api_key = "apikey"
    client = DocumentIntelligenceClient(endpoint=endpoint,credential=AzureKeyCredential(api_key))
    return client

def is_file_or_url(input_string):
    if os.path.isfile(input_string):
        return 'file'
    elif urlparse(input_string).scheme in ['http', 'https']:
        return 'url'
    else:
        return 'unknown'

def load_file_as_base64(file_obj):
    # Read the contents of the file object
    data = file_obj.read()
    # Encode the data as base64
    base64_bytes = base64.b64encode(data)
    base64_string = base64_bytes.decode('utf-8')
    return base64_string

app.py

import os
from flask import Flask, request, jsonify
from pathlib import Path
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from utils import get_client, load_file_as_base64

app = Flask(__name__)

@app.route('/extract_invoice', methods=['POST'])
def extract_invoice():
    # Get the file from the request
    file = request.files['file']

    # Create the 'temp' directory if it doesn't exist
    temp_dir = 'temp'
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)

    # Save the file to disk
    file_path = os.path.join(temp_dir, file.filename)
    file.save(file_path)
    model_id = 'prebuilt-invoice'
    doc_source = Path(file_path)

    document_ai_client = get_client()

    with open(doc_source, 'rb') as file_obj:
        file_base64 = load_file_as_base64(file_obj)

    poller = document_ai_client.begin_analyze_document(
        model_id,
        {"base64Source": file_base64},
        locale="en-US",
    )

    result = poller.result()

    # Clean up the temporary file
    os.remove(file_path)

    # Extract the invoice details
    invoice_details = []
    for document in result.documents:
        document_fields = document['fields']
        fields = document_fields.keys()

        invoice_detail = {}
        for field in fields:
            if field == 'Items':
                items_list = []
                items = document_fields[field]

                for item in items['valueArray']:
                    item_fields = item['valueObject']
                    item_dict = {}
                    for item_field in item_fields.keys():
                        value = item_fields[item_field].get('content', '')
                        item_dict[item_field] = value
                    items_list.append(item_dict)
                invoice_detail[field] = items_list
            else:
                value = document_fields[field].get('content', '')
                invoice_detail[field] = value

        invoice_details.append(invoice_detail)

    return jsonify(invoice_details)

if __name__ == '__main__':
    app.run(debug=True)

I tried every alternative to fix the issue but its not accepting the file/its content and giving me error: ' Incorrect format, please input the right format to import'. Additionally, I also faced the following issue: 'TypeError: cannot use a string pattern on a bytes-like object'

This is the error I always get for this particular image only. Key error for Subway invoice image Subway invoice image being used


Solution

  • I realized your error is HTTP Status 500

    It means the document_ai_client.begin_analyze_document() has a defect during processing.

    it is not a base64 decoding or encoding issue.

    I made my image decoding mock server and extracted text (key/value)

    Not a direct address to your server problem but I want to show your server has a problem.

    Overview

    enter image description here

    Step 1 conda demo_env environment

    Download and install Anaconda3

    Launching Anaconda Prompt

    enter image description here

    Create demo_env and install python

    conda create --name demo_env python=3.8
    

    enter image description here

    Switching demo_env environment

    conda activate demo_env
    
    pip install flask easyocr
    

    enter image description here

    Step 2 utility.py and app.py

    File tree

    enter image description here

    utility.py

    import easyocr
    import re
    
    def extract_invoice_details(image_path):
        reader = easyocr.Reader(['en'])
        result = reader.readtext(image_path)
        full_text = '\n'.join([detection[1] for detection in result])
    
        patterns = {
            'Amount': r'Amount\s*;\s*\$(\d+),(\d+)',
            'Application': r'Application:\s*(.*)',
            'AID': r'AID\s*:\s*(\w+)',
            'MiD': r'MiD:\s*(\d+)',
            'TID': r'TID:\s*(\d+)',
            'Date/Time': r'Date/T\s*ime;\s*(\d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2})'
        }
    
        extracted_items = {}
        for key, pattern in patterns.items():
            match = re.search(pattern, full_text)
            if match:
                if key == 'Amount':
                    extracted_items[key] = f"${match.group(1)}.{match.group(2)}"
                elif key == 'Date/Time':
                    extracted_items[key] = match.group(1).replace(' ', '; ')
                else:
                    extracted_items[key] = match.group(1)
        return extracted_items
    

    app.py

    import os
    from flask import Flask, request, jsonify
    from utility import extract_invoice_details
    
    app = Flask(__name__)
    
    @app.route('/extract_invoice', methods=['POST'])
    def extract_invoice():
        # Get the file from the request
        file = request.files['file']
    
        # Create the 'temp' directory if it doesn't exist
        temp_dir = 'temp'
        if not os.path.exists(temp_dir):
            os.makedirs(temp_dir)
    
        # Save the file to disk
        file_path = os.path.join(temp_dir, file.filename)
        file.save(file_path)
    
        # Use the utility function to process the image
        invoice_details = extract_invoice_details(file_path)
    
        # Clean up the temporary file
        os.remove(file_path)
    
        return jsonify(invoice_details)
    
    if __name__ == '__main__':
        app.run(debug=True)
    

    subway.jpg

    Your image to save locally.

    demo.py

    import easyocr
    reader = easyocr.Reader(['en'])  # 'en' is for English, you can add other languages as needed
    result = reader.readtext('subway.jpg')
    for detection in result:
        print(detection[1])  # Prints out extracted text
    

    Step 3 Extract text from subway.jpg

    python demo.py
    

    enter image description here

    demo_v2.py

    from utility import extract_invoice_details
    
    def main():
        # Specify the path to the image file
        image_path = 'subway.jpg'
        
        # Call the function from utility.py to extract invoice details
        invoice_details = extract_invoice_details(image_path)
        
        # Print the extracted details
        print("Extracted Invoice Details:")
        for key, value in invoice_details.items():
            print(f"{key}: {value}")
    
    if __name__ == '__main__':
        main()
    

    This code extracts only six key/value

    Extract Specific Data with Regex: Using predefined regular expressions, the script searches the aggregated text for specific pieces of information such as Amount, Application, AID, MiD, TID, and Date/Time. It formats some of these pieces for consistency and clarity before returning them.

    Amount ; $12,36
    Application: VISA CREDIT
    AID : AO000000031010
    MiD: 420429002208556
    TID: 75467009
    Date/T ime; 06/09/2021 12:54:29
    

    And adjust two keys

    Amount ; $12,36 ->  Amount: $12.36
    Date/T ime; 06/09/2021 12:54:29  -> Date/Time: 06/09/2021; 12:54:29
    
    python demo_v2.py
    

    enter image description here

    Step 4 Run Flask Server

    python app.py
    

    enter image description here

    Step 5 API call by Postman with subway.jpg

    URL

    POST http://localhost:5000/extract_invoice
    

    Body Select Form-data

    The key is file and the value is subway.jpg

    enter image description here

    Press Send button

    Body of Response

    {
        "AID": "AO000000031010",
        "Amount": "$12.36",
        "Application": "VISA CREDIT",
        "Date/Time": "06/09/2021; 12:54:29",
        "MiD": "420429002208556",
        "TID": "75467009"
    }
    

    enter image description here

    Summary

    I believe the issue lies with your server's internal configuration, not with the base64 encoding or Postman.