Search code examples
pythonpython-docx

Get text from table in .docx file using python


I need to get full text of document as python string. So, I use docx library:

doc = docx.Document(user_file)
fullText = []
for para in doc.paragraphs:
   fullText.append(para.text)
text = '\n'.join(fullText)

It works, but ignore text in tables. How should I get data from tables? Maybe there is any way to clear tags or somehow prepare document? Thanks in advance!


Solution

  • doc.tables returns a list of Table instances corresponding to the tables in the document, in document order. Note that only tables appearing at the top level of the document appear in this list; a table nested inside a table cell does not appear. A table within revision marks such as <w:ins> or <w:del> will also not appear in the list.