Search code examples
pythonboto3text-extractionamazon-textract

Textract Unsupported Document Exception


I'm trying to use boto3 to run a textract detect_document_text request.

I'm using the following code:

client = boto3.client('textract')
response = client.detect_document_text(
             Document={
            'Bytes': image_b64['document_b64']
        }
      )

Where image_b64['document_b64'] is a base64 image code that I converted using, for exemplo, https://base64.guru/converter/encode/image website.

But I'm getting the following error:

UnsupportedDocumentException

What I'm doing wrong?


Solution

  • For future reference, I solved that problem using:

    client = boto3.client('textract')
    image_64_decode = base64.b64decode(image_b64['document_b64']) 
    bytes = bytearray(image_64_decode)
    response = client.detect_document_text(
        Document={
            'Bytes': bytes
        }
    )