Issue reading text with python-docx when document contains Images

I am having issues parsing text from a document that contains images.

I am using version 0.7.0 of Python docx on a Ubuntu Linux machine running Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-60-generic x86_64)

I am using this logic:

```

        document = Document(path)
        # Get all paragraphs
        paras = document.paragraphs

        text = ""

        # Push the text from the paragraph on a single string
        for para in paras:
            # Don't forget the line break
            text += "\n" + para.text

        return text.strip()

```

When there is an image this process fails.

Is there something I am doing wrong?

Solution

python-docx should support what you're trying to do here. If you'll provide the stack trace you get when the error is raised I'll take a look.

Btw, you can code this a little more elegantly as:

document = Document(path)
text = '\n'.join([para.text for para in document.paragraphs])