Search code examples
pythonamazon-web-servicesamazon-s3python-docx

Facing issues while trying to access a file of type docx stored in aws-s3 bucket with the help of python-docx


I have a docx file at my aws-s3 bucket. I need to read it use python-docx. I write this:

from docx import Document
document = Document('https://my-first-backup-bucket-v1.s3-ap-southeast-1.amazonaws.com/New+Proposed+Quote.docx')

then, have error.. PackageNotFoundError: Package not found at 'https://my-first-backup-bucket-v1.s3-ap-southeast-1.amazonaws.com/New+Proposed+Quote.docx'

why?

when I tried to access the same file from browser it is opening successfully. for testing purpose I created this file with public access anyone can test this, can anyone please help on this?


Solution

  • From Document objects — python-docx 0.8.10 documentation:

    docx.Document(docx=None)

    Return a Document object loaded from docx, where docx can be either a path to a .docx file (a string) or a file-like object. If docx is missing or None, the built-in default document “template” is loaded.

    It is saying that the supplied filename should point to a local file. It does not say that a URL is accepted.

    Therefore, you should download the file from Amazon S3, then point to it on the local file system.