I'm getting an error when I try to access files from my working directory. This is not a duplicate of @dsphoebe 's question, because the file I try to open is certainly a .docx file and for whatever reason, I can't open it.
rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print(file) #prints all word docs in my folder, just like I want
Now when I replace that print statement with a statement that creates a docx object,
rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
doc = docx.Document(os.path.join(rootdir, file))
...
#continuing with what I wanted to do w/ the documents...
Error: "Package not found at '%s'" % pkg_file
docx.opc.exceptions.PackageNotFoundError: Package not found at 'my doc.docx'
Now, in that mydocs folder, my doc.docx is the proper title and it certainly is a .docx file. This certain file's title is composed of 2 words with one space (ie. my doc.docx). But the conversion to a type 'Document' object works for another word doc in that same folder that is only ONE word!
This works:
rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
doc = docx.Document(os.path.join(rootdir, "Access.docx"))
Exited with code = 0
But this doesn't:
rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
doc = docx.Document(os.path.join(rootdir, "Able2Extract Professional.docx"))
Exited with code=1
So two words.docx
wouldn't work but oneword.docx
would. Very confusing. Anyone know how to diagnose this problem?
Where are you using the rootdir
variable?
If docx
is just trying to open 'my doc.docx' by concatenating that string with the current dir it might not be properly escaping the space character in the filename.
Try using os.path.join()
:
import os.path
rootdir = 'C:\\Users\\me\\Documents\\Python\\mydocs\\'
doc = docx.Document(os.path.join(rootdir, "my doc.docx"))