Search code examples
pythonpython-2.7python-docx

How do you read a table from a certain part in a word document using python-docx?


I am reading from a word file using Python with many tables in the document. I need to extract data only from certain tables, depending on the sections they appear in. Is there any way to search through the file, reach a certain line, and read the table that appears after the line?

For example, if the word document is something like:

1
2
3
[table]
4
5
6
[table]

would I be able to read the table specifically after the '6'?

Reading the 'second table' would not work, because the number of tables that appear before that table is arbitrary; I need to read it because it appears after the '6'.


Solution

  • The code here may be of interest: https://github.com/python-openxml/python-docx/issues/276#issuecomment-199502885.

    What you're looking for, I believe, is a way to iterate the block level items in a document, in the order they appear. A Word document has two types of block-level items, paragraphs and tables. The function at the link above allows you to iterate those in document order.