Search code examples
pythongoogle-cloud-vision

Google cloud vision, lumping lines together


I'm testing Google cloud vision. I want it to just read across the page in sequence, line by line. Here is the code.

url = 'https://www.sec.gov/Archives/edgar/data/1633917/000163391720000091/q120paypalearningsreleas013.jpg'

def detect_text_uri(uri):
    """Detects text in the file located in Google Cloud Storage or on the Web.
    """
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()
    image = vision.types.Image()
    image.source.image_uri = uri

    response = client.text_detection(image=image)
    texts = response.text_annotations
    print('Texts:')

    for text in texts:
        print('\n"{}"'.format(text.description))

        vertices = (['({},{})'.format(vertex.x, vertex.y)
                    for vertex in text.bounding_poly.vertices])

        print('bounds: {}'.format(','.join(vertices)))

    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))

if __name__ == '__main__': detect_text_uri(url)

You can see it does pretty well until it gets to "Payment Transactions per active acount", then it lumps it with the next line. It's no longer going line by line.

How do I fix this? The problem is when I look through the docs, I'm already using the text detection feature. Not sure how to further improve the result.


Solution

  • Google vision is not configurable in this levels.

    You have two options to read text in document

    TEXT_DETECTION Run text detection / optical character recognition (OCR). Text detection is optimized for areas of text within a larger image; if the image is a document, use DOCUMENT_TEXT_DETECTION instead.

    DOCUMENT_TEXT_DETECTION Run dense text document OCR. Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are present.

    If TEXT_DETECTION and DOCUMENT_TEXT_DETECTION return the same unsatisfying answer you have to modify the image itself.

    For example using the Cloud demo api you can see immediate results

    I slightly changed the image and got better results for this specific line.

    Img (cropped and with additional contrast) result

    Keep in mind it's just an example and you need to find a sufficient way to modify the image

    EDIT: also maybe it worth to explore Document AI