I need to find (and extract) the section heading of certain tables in a DOCX file.
The problem is, there might be empty paragraphs or even other tables before a table of relevance, so I'd need to iterate backwards until a heading of any level.
Document
Heading
(paragraphs)
Table 1
Subheading
(paragraphs)
(irrelevant table)
Table 2
My starting point is as follows:
from docx import Document
doc = Document(infile)
for i, table in enumerate(doc.tables):
for previous paragraph: # <=== How can I iterate backwards?
if paragraph.style.name.startswith('Heading'):
heading = paragraph.text
break
Thanks in advance!
You should use the Document object's iter_inner_content()
method.
Documented here: https://python-docx.readthedocs.io/en/latest/api/document.html#docx.document.Document.iter_inner_content
Document.iter_inner_content()
will allow you to iterate through both paragraphs and tables in the order they appear in the document. You can keep track of the current heading as you iterate through paragraphs, updating a variable each time you reach a new heading, and then reference/output it when you reach a table.