I'm trying to OCR this image using Google's DocumentAI. But it seems to output the text in completely wrong orders.
Here's the image:
The output is as follows:
piece of clothing, = do up zip up something with difficulty. zip something up you fasten it using a zip. She zipped up the dress Hezipped his jeans up.
It seems to first split the image in half, which it shouldn't, then read the text respectively. But the image is already splitted, and is intended to be read line by line.
How to tell the DocumentAI to just read the image line by line?
This is the python code I'm using:
def quickstart(
project_id: str, location: str, processor_id: str, file_path: str, mime_type: str, processor_version_id: str = None
):
# You must set the api_endpoint if you use a location other than 'us'.
opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project_id/locations/location/processor/processor_id
# name = client.processor_path(project_id, location, processor_id)
if processor_version_id:
# The full resource name of the processor version, e.g.:
# projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}
name = client.processor_version_path(
project_id, location, processor_id, processor_version_id
)
else:
# The full resource name of the processor, e.g.:
# projects/{project_id}/locations/{location}/processors/{processor_id}
name = client.processor_path(project_id, location, processor_id)
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
# Load Binary Data into Document AI RawDocument Object
raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)
# Configure the process request
request = documentai.ProcessRequest(name=name, raw_document=raw_document)
result = client.process_document(request=request)
# For a full list of Document object attributes, please reference this page:
# https://cloud.google.com/python/docs/reference/documentai/latest/google.cloud.documentai_v1.types.Document
document = result.document
# Read the text recognition output from the processor
f.write(file_path + "\n")
f.write(document.text)
Document AI OCR may include paragraph/block text in a different order than expected in the API response due to the varying ways that text can be portrayed on a page.
You can retrieve the bounding box information using the information in Document.paragraphs[].layout.boundingPoly
to determine which order to handle the text. (E.g. top to bottom, left to right, etc.)
You can refer to handle the processing response for more information on how this response is structured. You can also try this demo to see how the blocks and paragraphs are extracted.