Search code examples
pythonpython-3.xadobeacrobat

How to rebuild pdf from adobe returned entire PDF file as what looks like binary string


I have a PDF with submit button that sends the entire pdf in the body of a POST API

"PDF: Returns the entire PDF file with the user input."(https://helpx.adobe.com/acrobat/using/setting-action-buttons-pdf-forms.html)

However, it comes in a very weird format and I am really lost on how to rebuild the pdf back from that "binary string"(I might be wrong)

This is how it look like but it's pretty long(all pdf's binaries are too long)

[1]: https://i.sstatic.net/3ZTiC.png

After I .encode().decode().encode('utf-8'), it looks like this

enter image description here

I tried to use b64decode(t, validate=True) but it fails and says some characters can not be decoded. I also tried .decode('windows-1252') but same thing.

Adobe Acrobat documentation is not really clear on how to proceed.

I would really appreciate and will upvote any suggestion or hint.


Solution

  • After days of looking,

    The problem is actually in AWS API Gateway itself. it converts the file code somehow before passing it through to Lambda function. That misses up the binary64.

    This great article here helped me to fix it and now by simply base64.b64decode() it, it works well! https://medium.com/swlh/upload-binary-files-to-s3-using-aws-api-gateway-with-aws-lambda-2b4ba8c70b8e