Search code examples
pythonms-worddata-extraction

How do I extract all data from multiple tables in many word documents in python (Data-Extraction directly from MS Word)?


I tried using the below codes but it can only open a document to print the cell text.

The problem is that I have 67 word documents with similar tables, how do I extract all data from tables in each 67 word documents?

Currently the below codes can open only a document to extract cell text in all tables, however, I would like to open multiple word documents in a folder by using the codes below. Therefore, is there a way to open multiple word documents using the below codes? Please help to take a look at the below codes, thanks!!! :((

from docx import Documenthttps

wordDoc = Document(r"C:\Users\user\Documents\Lynn\FYPJ P3\FYP (Updated Ver)\FYP\dataprep\documents_sampling\860305644_Cat_5_Patient Care Record (Inpatient Nursing)_Admission.docx")
for table in wordDoc.tables:
    for row in table.rows:
        for cell in row.cells:
            print(cell.text)

Solution

  • You can just use this :

    import os
    from docx import Documenthttps
    
    path = '\\some\\path\\to\\folder'
    worddocs_list = []
    for filename in os.listdir(path):
        wordDoc = Document(path+"\\"+filename)
        worddocs_list.append(wordDoc)
    
    for wordDoc in worddocs_list:
        for table in wordDoc.tables:
            for row in table.rows:
                for cell in row.cells:
                    print(cell.text)