python boto3 text-extraction amazon-textract

Textract Unsupported Document Exception

I'm trying to use boto3 to run a textract detect_document_text request.

I'm using the following code:

client = boto3.client('textract')
response = client.detect_document_text(
             Document={
            'Bytes': image_b64['document_b64']
        }
      )

Where image_b64['document_b64'] is a base64 image code that I converted using, for exemplo, https://base64.guru/converter/encode/image website.

But I'm getting the following error:

UnsupportedDocumentException

What I'm doing wrong?

Solution

For future reference, I solved that problem using:

client = boto3.client('textract')
image_64_decode = base64.b64decode(image_b64['document_b64']) 
bytes = bytearray(image_64_decode)
response = client.detect_document_text(
    Document={
        'Bytes': bytes
    }
)