I want to extract the text from a table that is in a .docx files using python for further analysis. Im using the following code:
document = Document(path_to_your_docx)
tables = document.tables
for table in tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
print(paragraph.text)
But it seems there is another "table" in the cell of this table, so I'm not able to extract this part (shown in the attached image). When I use the code above, I can't fetch the "Yes/No" text.
I have tried also to iterate through the cells as if in a table, but I get the error that the cell doesn't have table attribute. Any advice?
Thanks.
I have a workaround for this issue. Instead of using the library python-docx to extract the text from the docx file, I used the library docx2txt (extract all the text) and then I just needed to find the specific word in the string.
text = docx2txt.process(file)
q = "Example1"
result = text[text.find(q)+len(q):].split()[0]
and this gives me the "Yes" or "No" from Column2, for each value on Column1 (In the example above, it gives Yes).