Search code examples
pythonpython-docx

Package not found error on windows in python-docx?


I'm getting an error when I try to access files from my working directory. This is not a duplicate of @dsphoebe 's question, because the file I try to open is certainly a .docx file and for whatever reason, I can't open it.

rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print(file)    #prints all word docs in my folder, just like I want

Now when I replace that print statement with a statement that creates a docx object,

rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        doc = docx.Document(os.path.join(rootdir, file))
        ...
        #continuing with what I wanted to do w/ the documents...

Error: "Package not found at '%s'" % pkg_file
docx.opc.exceptions.PackageNotFoundError: Package not found at 'my doc.docx'

Now, in that mydocs folder, my doc.docx is the proper title and it certainly is a .docx file. This certain file's title is composed of 2 words with one space (ie. my doc.docx). But the conversion to a type 'Document' object works for another word doc in that same folder that is only ONE word!

This works:

    rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
    doc = docx.Document(os.path.join(rootdir, "Access.docx"))
    Exited with code = 0

But this doesn't:

    rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
    doc = docx.Document(os.path.join(rootdir, "Able2Extract Professional.docx"))
    Exited with code=1

So two words.docx wouldn't work but oneword.docx would. Very confusing. Anyone know how to diagnose this problem?


Solution

  • Where are you using the rootdir variable?

    If docx is just trying to open 'my doc.docx' by concatenating that string with the current dir it might not be properly escaping the space character in the filename.

    Try using os.path.join():

    import os.path
    rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
    doc = docx.Document(os.path.join(rootdir, "my doc.docx"))