I am having issues parsing text from a document that contains images.
I am using version 0.7.0 of Python docx on a Ubuntu Linux machine running Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-60-generic x86_64)
I am using this logic:
```
document = Document(path)
# Get all paragraphs
paras = document.paragraphs
text = ""
# Push the text from the paragraph on a single string
for para in paras:
# Don't forget the line break
text += "\n" + para.text
return text.strip()
```
When there is an image this process fails.
Is there something I am doing wrong?
python-docx
should support what you're trying to do here. If you'll provide the stack trace you get when the error is raised I'll take a look.
Btw, you can code this a little more elegantly as:
document = Document(path)
text = '\n'.join([para.text for para in document.paragraphs])