Search code examples
pythonpython-docx

How to get all the text in a nested table using python?


I have to extract all the text in a nested table (tables inside table inside table) from a word document. I'm unable to do it using the python-docx, maybe my lack of knowledge.

Please suggest some code examples.


Solution

  • You will want some sort of recursion. The basic idea is:

    def iter_paragraphs_of_tables(tables):
        for table in tables:
            for row in table.rows:
                for cell in row.cells:
                    yield from cell.paragraphs
                    yield from iter_paragraphs_of_tables(cell.tables)
    
    for paragraph in iter_paragraphs_of_tables(document.tables):
        print(paragraph.text)
    

    This is Python3, if you're on Python2 you'll need to expand the yield from statements into, for example:

    yield from cell.paragraphs
    # --- becomes ---
    for paragraph in cell.paragraphs:
        yield paragraph