Search code examples
pythonvbams-wordextract

Extracting tables from a word doc


Is there any tool to extract all tables from a word documents and converting them to a csv file or any excel extension file using python or vba

note that the word file contains both text and tables.


Solution

  • You can use pandas with python-docx. Per this answer you can extract all tables from a document and put them in a list:

    from docx import Document
    import pandas as pd
    document = Document('test.docx')
    
    tables = []
    for table in document.tables:
        df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
        for i, row in enumerate(table.rows):
            for j, cell in enumerate(row.cells):
                if cell.text:
                    df[i][j] = cell.text
        tables.append(pd.DataFrame(df))
    

    You can then save the tables to csv files by looping through the list:

    for nr, i in enumerate(tables):
        i.to_csv("table_" + str(nr) + ".csv")