Search code examples
pythondocxpython-docx

Retrieve document content with document structure with python-docx


I have to retrieve tables and previous/next paragraphs from docx file, but can't imagine how to obtain this with python-docx

I can get a list of paragraphs by document.paragraphs

I can get a list of tables by document.tables

How can I get an ordered list of document elements like this

[
Paragraph1,
Paragraph2,
Table1,
Paragraph3,
Table3,
Paragraph4,
...
]?

Solution

  • Resolved as property Document.story, contains paragraphs and tables in document order

    https://github.com/python-openxml/python-docx/pull/395

    document = Document('test.docx')
    document.story