Search code examples
pythondocxpython-docx

How to extract text from a table in a .docx file?


I want to extract the text from a table that is in a .docx files using python for further analysis. Im using the following code:

document = Document(path_to_your_docx)
tables = document.tables
for table in tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                print(paragraph.text)

But it seems there is another "table" in the cell of this table, so I'm not able to extract this part (shown in the attached image). When I use the code above, I can't fetch the "Yes/No" text.

I have tried also to iterate through the cells as if in a table, but I get the error that the cell doesn't have table attribute. Any advice?

The table looks like this

code behind table creation

Thanks.


Solution

  • I have a workaround for this issue. Instead of using the library python-docx to extract the text from the docx file, I used the library docx2txt (extract all the text) and then I just needed to find the specific word in the string.

    text = docx2txt.process(file)
    
    q = "Example1"
    result = text[text.find(q)+len(q):].split()[0]
    

    and this gives me the "Yes" or "No" from Column2, for each value on Column1 (In the example above, it gives Yes).